Re: Small ifconfig output tweak for inet6?
On 26 March 2015 at 14:27, Stuart Henderson st...@openbsd.org wrote: seems reasonable. (I'd quite like that for v4 too, though it wouldn't cope with non-contiguous netmask ;) non-contiguous netmasks for IPv4 addresses configured on an interface? is that possible? what's the use case? perhaps you're confusing this with non-contiguous netmasks in the radix tree that are entered by the ipsec flows containing port numbers? however I agree that if we do this for ipv6 we should do it for ipv4 as well but then do we care about tons of stuff out there parsing ifconfig output?
Re: Avoid doing IPv6 SLAAC for prefixes with preferred lifetime of zero
On 9 March 2015 at 16:37, Stuart Henderson st...@openbsd.org wrote: whitespace nit here (5 char indent), otherwise OK done on purpose.
Avoid doing IPv6 SLAAC for prefixes with preferred lifetime of zero
Hi, It looks like Mac OS X puts some IPv6 garbage on the wire and our cheap consumer router starts happily advertising routes like this: 4006:16e1:ac17:189::/64 if=re0 flags=LAO vltime=6401, pltime=0, expire=1h46m34s, ref=0 advertised by fe80::9ec7:a6ff:fe86:a3f4%re0 (reachable) This makes us generate autoconf and privacy addresses which immediately become deprecated (pltime=0 translates to a zero preferred lifetime for these addresses). Within minutes on an active network you end up with hundreds of dead IPv6 addresses. RFC 4941 says in 3.3.5: In particular, an implementation MUST NOT create a temporary address with a zero Preferred Lifetime. Which means we have to strengthen the lifetime check. OK? This also raises the question whether we should memorize such prefixes at all -- but I haven't found a straight answer just yet. diff --git sys/netinet6/nd6_rtr.c sys/netinet6/nd6_rtr.c index de23248..d7f59de 100644 --- sys/netinet6/nd6_rtr.c +++ sys/netinet6/nd6_rtr.c @@ -1374,16 +1374,17 @@ prelist_update(struct nd_prefix *new, struct nd_defrouter *dr, struct mbuf *m) ia6-ia6_lifetime = lt6_tmp; ia6-ia6_updatetime = time_second; } if ((!autoconf || ((ifp-if_xflags IFXF_INET6_NOPRIVACY) == 0 - !tempaddr_preferred)) new-ndpr_vltime != 0 +!tempaddr_preferred)) + new-ndpr_vltime != 0 new-ndpr_pltime != 0 !((ifp-if_xflags IFXF_INET6_NOPRIVACY) statique)) { /* * There is no SLAAC address and/or there is no preferred RFC -* 4941 temporary address. And the valid prefix lifetime is -* non-zero. And there is no static address in the same prefix. +* 4941 temporary address. And prefix lifetimes are non-zero. +* And there is no static address in the same prefix. * Create new addresses in process context. * Increment prefix refcount to ensure the prefix is not * removed before the task is done. */ pr-ndpr_refcnt++;
Re: splassert: rtrequest1: want 5 have 0
On 19 February 2015 at 21:30, Alexander Bluhm alexander.bl...@gmx.net wrote: On Wed, Feb 18, 2015 at 12:14:15PM +0100, Matthieu Herrb wrote: Feb 18 12:09:59 castor /bsd: splassert: rtrequest1: want 5 have 0 Feb 18 12:09:59 castor /bsd: Starting stack trace... Feb 18 12:09:59 castor /bsd: splassert_check() at splassert_check+0x78 Feb 18 12:09:59 castor /bsd: rtrequest1() at rtrequest1+0x5e Feb 18 12:09:59 castor /bsd: nd6_prefix_offlink() at nd6_prefix_offlink+0x1bf Feb 18 12:09:59 castor /bsd: pfxlist_onlink_check() at pfxlist_onlink_check+0x25e Feb 18 12:09:59 castor /bsd: in6_control() at in6_control+0x894 Feb 18 12:09:59 castor /bsd: ifioctl() at ifioctl+0x175 Feb 18 12:09:59 castor /bsd: sys_ioctl() at sys_ioctl+0x169 Feb 18 12:09:59 castor /bsd: syscall() at syscall+0x297 Feb 18 12:09:59 castor /bsd: --- syscall (number 54) --- Feb 18 12:09:59 castor /bsd: end of kernel Feb 18 12:09:59 castor /bsd: end trace frame: 0xc8115948400, count: 249 Feb 18 12:09:59 castor /bsd: 0xc8115715cda: Feb 18 12:09:59 castor /bsd: End of stack trace. Feb 18 12:10:00 castor /bsd: carp0: state transition: BACKUP - MASTER Most calls to pfxlist_onlink_check() are protected by splsoftnet. Only the path in your trace does not set it. So I suggest to set splsoftnet() in in6_control(). I have included the dohooks() as this is done in IPv4. While there I have moved some splsoftnet() hiding in the declarations to the beginning of the code. ok? bluhm OK, thanks for taking a look!
Re: libpcap use after free
On 15 January 2015 at 03:53, Lawrence Teo l...@openbsd.org wrote: libpcap has a use after free (found via LLVM). pcap_close() currently looks like this: void pcap_close(pcap_t *p) { if (p-opt.source != NULL) free(p-opt.source); pcap_cleanup_bpf(p); free(p); } The bug affects libpcap programs that enable monitor mode on 802.11 devices (i.e. if they call pcap_set_rfmon() followed by pcap_activate()). If pcap_close() is called after that, pcap_cleanup_bpf() will attempt to use p-opt.source while trying to disable monitor mode, resulting in a use after free. The fix is simple (diff below). I tested this with a small program that calls pcap_create(), pcap_set_rfmon(), pcap_activate(), and pcap_close() on an iwn(4) device with MALLOC_OPTIONS=AFGJPRX. With the diff applied, the test program no longer segfaults. ok? Looks good to me. OK mikeb
Re: Kill IPv4 list of addresses
On 6 January 2015 at 13:26, Martin Pieuchot mpieuc...@nolizard.org wrote: Diff below remove the last use of the global IPv4 list of addresses. The code using it is a hack to move the unique cloning route of a subnet from one ifa to another. I know a proper fix would be to use multipath for that, but this is not possible feasible right now because we cannot select multipath route entries based on a different ifa. In the meantime this allow us to simplify in_ifinit() which still needs better error handling. ok? Looks good to me.
Re: idle pool page gc
On 22 December 2014 at 06:43, David Gwynne da...@gwynne.id.au wrote: this introduces a global gc task that loops over all the pools looking for pages that havent been used for a very long time so they can be freed. this is the simplest way of doing this without introducing per pool timeouts/tasks which in turn could introduce races with pool_destroy, or more shared global state that gets touched on pool_get/put ops. im adding the timeout in src/sys/kern/init_main.c cos having pool_init do it causes faults when a pool is created before the timeout subsystem is set up. i have tested this on sparc64, amd64, and hppa, and it seems to do the advertised job. i have a couple of questions to understand the reasoning behind this. 1) how is it different from what we have now? 2) why can't you do it when you pool_put? 3) why can't you call it from uvm when there's memory pressure since from what i understand pool_reclaim was supposed to work like that? 4) i assume you don't want to call pool_reclaim from pool_gc_pages because of the mutex_enter_try benefits, but why does logic in these functions differ, e.g. why did you omit the pr_itemsperpage bit? 5) why 8 ticks is a very long time to free a page? why not 10 seconds? 6) why is there an XXX next to the splx? 7) it looks like pool_reclaim_all should also raise an IPL since it does the same thing. wasn't it noteced before?
Re: divert(4) m_pullup
On 16 December 2014 at 12:08, Mark Kettenis mark.kette...@xs4all.nl wrote: Date: Mon, 15 Dec 2014 23:44:54 -0500 From: Lawrence Teo l...@openbsd.org Make divert_output() do an m_pullup only if truly needed. ok? Questionable. AFAIK m_pullup(9) will only do the pullup if it is necesary in the first place. I agree. m_pullup already checks that. Is there a measurable speedup from inlining the check?
Re: page fault at resume, possibly caused by java/jenkins
On 11 December 2014 at 14:16, Martin Pieuchot mpieuc...@nolizard.org wrote: On 11/12/14(Thu) 12:37, frantisek holop wrote: login: kernel: page fault trap, code=0 Stopped at in_selectsrc+0xd8: movl 0xf0(%esi),%ebx ddb{0} trace in_selectsrc(f617cdc8,d80b6714,d35d9270,d8cfac44,d8cfac34) at in_selectsrc+0xd8 udp_output(d8cfabfc,d80b6900,d80b6700,0,0) at udp_output+0xf8 sosend(d7f881c8,d80b6700,f617ce90,d80b6900,0) at sosend+0x44b sendit(d84f4b40,103,f617cef4,0,f617cf80) at sendit+0x1e1 sys_sendto(d84f4b40,f617cf60,f617cf80,d0568b5a,d84f4b40) at sys_sendto+0x6c syscall() at syscall+0x144 --- syscall (number 259) --- 0x2: ddb{0} run0 usb dongle + dhclient through a home router, with one special route added that is forwarded through a linux box (on the same home router network) which is connected to a vpn in a different country. it is added like this: # route add 192.168.10.2 $LINUX_IP jenkins is running a shell script, that is running tests on the 192.168.10.2 web site accessible only through the vpn. the tests are simple selenium tests recorded through the firefox plugin. a quick google reveals that jenkins itself might be using some udp/multicast functionality: https://wiki.jenkins-ci.org/display/JENKINS/Auto-discovering+Jenkins+on+the+network Could you try the diff below and let me know if it improves the situation? It should prevent your kernel to panic. Even though, more work might be required to have working socket w/ multicast using USB devices upon resume. What's happening is simply a use after free because there's no way to invalidate multicast options attached to a pcb when an interface is detached. I which multicast would have been implemented using route entries instead of this hack... Change below works around this problem by using interface indexes. Makes sense to me. OK mikeb
Re: tcpdump follows incorrect alignment requirement rules
On 9 December 2014 at 19:00, Christian Weisgerber na...@mips.inka.de wrote: On 2014-12-03, Mike Belopuhov m...@belopuhov.com wrote: bpf aligns data following the datalink header (e.g. ethernet) on the BPF_ALIGNMENT boundary. Since rev1.41 of bpf.h it's uint32_t instead of a long. And also since then almost all packets become unaligned from the tcpdump perspective and require costly copies into the internal buffer. Neither IP header (struct ip) nor IPv6 (struct ip6_hdr) have fields larger than 32 bits and therefore alignment requirements for them are at most 32 bit. I've successfully tested this diff on sparc64 and amd64. Oops, turns out I've had the same diff in my tree on my sparc64 for a long time (last year?). No ill effects, but I do little tcpdumping on that machine. I think you have forgotten a chunk in print-atalk.c: I did not. I can't test AppleTalk, so if you can, please do so. By analogy it looks correct (:
Call for testing of watchdog devices
Hi, We plan to remove shutdown hooks for good and need to convert all drivers implementing a watchdog(4) device to the config_* framework namely implementing the activate method with a DVACT_POWERDOWN action handler fleshed out. This is the diff I came up with. wdog_shutdown will now check that the registered watchdog is the one we're disabling. This compiles on amd64, i386 and sparc64, works fine on glxpcib (the only one we have access to). Here's a list of drivers that get changed: armv7/omap/omdog armv7/sunxi/sxidog i386/esm i386/elansc i386/geodesc sgi/imc sparc64/lom sparc64/pmc ipmi (not enabled) fins, it, schsio, viasio berkwdt, glxpcib, ichwdt, pwdog, tcpcib, wdt Tests and OK's are welcome. diff --git sys/arch/armv7/omap/omdog.c sys/arch/armv7/omap/omdog.c index a8fb5d4..166bc11 100644 --- sys/arch/armv7/omap/omdog.c +++ sys/arch/armv7/omap/omdog.c @@ -55,18 +55,19 @@ struct omdog_softc { }; struct omdog_softc *omdog_sc; void omdog_attach(struct device *, struct device *, void *); +intomdog_activate(struct device *, int); void omdog_start(struct omdog_softc *); void omdog_stop(struct omdog_softc *); void omdog_sync(struct omdog_softc *); intomdog_cb(void *, int); void omdog_reset(void); struct cfattachomdog_ca = { - sizeof (struct omdog_softc), NULL, omdog_attach + sizeof (struct omdog_softc), NULL, omdog_attach, NULL, omdog_activate }; struct cfdriver omdog_cd = { NULL, omdog, DV_DULL }; @@ -93,10 +94,24 @@ omdog_attach(struct device *parent, struct device *self, void *args) #ifndef SMALL_KERNEL wdog_register(omdog_cb, sc); #endif } +int +omdog_activate(struct device *self, int act) +{ + switch (act) { + case DVACT_POWERDOWN: +#ifndef SMALL_KERNEL + wdog_shutdown(self); +#endif + break; + } + + return (0); +} + void omdog_start(struct omdog_softc *sc) { /* Write the enable sequence data h followed by h */ bus_space_write_4(sc-sc_iot, sc-sc_ioh, WSPR, 0x); diff --git sys/arch/armv7/sunxi/sxidog.c sys/arch/armv7/sunxi/sxidog.c index ab327dd..c59660f 100644 --- sys/arch/armv7/sunxi/sxidog.c +++ sys/arch/armv7/sunxi/sxidog.c @@ -64,18 +64,20 @@ struct sxidog_softc { }; struct sxidog_softc *sxidog_sc = NULL; /* for sxidog_reset() */ void sxidog_attach(struct device *, struct device *, void *); +int sxidog_activate(struct device *, int); int sxidog_callback(void *, int); #if 0 int sxidog_intr(void *); #endif void sxidog_reset(void); struct cfattachsxidog_ca = { - sizeof (struct sxidog_softc), NULL, sxidog_attach + sizeof (struct sxidog_softc), NULL, sxidog_attach, + NULL, sxidog_activate }; struct cfdriver sxidog_cd = { NULL, sxidog, DV_DULL }; diff --git sys/arch/i386/i386/esm.c sys/arch/i386/i386/esm.c index 54f9f6d..40a278b 100644 --- sys/arch/i386/i386/esm.c +++ sys/arch/i386/i386/esm.c @@ -43,10 +43,11 @@ int esmdebug = 3; #define DPRINTFN(n,x...) /* n: x */ #endif intesm_match(struct device *, void *, void *); void esm_attach(struct device *, struct device *, void *); +intesm_activate(struct device *, int); enum esm_sensor_type { ESM_S_UNKNOWN, /* XXX */ ESM_S_INTRUSION, ESM_S_TEMP, @@ -122,11 +123,12 @@ struct esm_softc { int sc_wdog_period; volatile intsc_wdog_tickle; }; struct cfattach esm_ca = { - sizeof(struct esm_softc), esm_match, esm_attach + sizeof(struct esm_softc), esm_match, esm_attach, + NULL, esm_activate }; struct cfdriver esm_cd = { NULL, esm, DV_DULL }; @@ -272,10 +274,22 @@ esm_attach(struct device *parent, struct device *self, void *aux) timeout_add_sec(sc-sc_timeout, 1); } } int +esm_activate(struct device *self, int act) +{ + switch (act) { + case DVACT_POWERDOWN: + wdog_shutdown(self); + break; + } + + return (0); +} + +int esm_watchdog(void *arg, int period) { struct esm_softc*sc = arg; struct esm_wdog_propprop; struct esm_wdog_state state; diff --git sys/arch/i386/pci/elan520.c sys/arch/i386/pci/elan520.c index bf077dd..b97476b 100644 --- sys/arch/i386/pci/elan520.c +++ sys/arch/i386/pci/elan520.c @@ -68,10 +68,11 @@ struct elansc_softc { struct timecounter sc_tc; } *elansc; intelansc_match(struct device *, void *, void *); void elansc_attach(struct device *, struct device *, void *); +intelansc_activate(struct device *, int); void elansc_update_cpuspeed(void); void elansc_setperf(int); intelansc_cpuspeed(int *); void elansc_wdogctl(struct elansc_softc *, int, uint16_t); @@ -84,11 +85,12 @@ voidelansc_gpio_pin_write(void *, int, int); void elansc_gpio_pin_ctl(void *, int, int); u_int
Re: tcpdump follows incorrect alignment requirement rules
On Wed, Dec 03, 2014 at 14:56 +0100, Mike Belopuhov wrote: bpf aligns data following the datalink header (e.g. ethernet) on the BPF_ALIGNMENT boundary. Since rev1.41 of bpf.h it's uint32_t instead of a long. And also since then almost all packets become unaligned from the tcpdump perspective and require costly copies into the internal buffer. Neither IP header (struct ip) nor IPv6 (struct ip6_hdr) have fields larger than 32 bits and therefore alignment requirements for them are at most 32 bit. I've successfully tested this diff on sparc64 and amd64. OK? Still looking for OKs. diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c index 1839869..60946a1 100644 --- usr.sbin/tcpdump/print-ip.c +++ usr.sbin/tcpdump/print-ip.c @@ -368,11 +368,11 @@ ip_print(register const u_char *bp, register u_int length) /* * If the IP header is not aligned, copy into abuf. * This will never happen with BPF. It does happen with raw packet * dumps from -r. */ - if ((intptr_t)ip (sizeof(long)-1)) { + if ((intptr_t)ip (sizeof(u_int32_t)-1)) { static u_char *abuf = NULL; static int didwarn = 0; int clen = snapend - bp; if (clen snaplen) diff --git usr.sbin/tcpdump/print-ip6.c usr.sbin/tcpdump/print-ip6.c index cca8495..ea3a9b3 100644 --- usr.sbin/tcpdump/print-ip6.c +++ usr.sbin/tcpdump/print-ip6.c @@ -70,11 +70,11 @@ ip6_print(register const u_char *bp, register u_int length) /* * The IP header is not word aligned, so copy into abuf. * This will never happen with BPF. It does happen with * raw packet dumps from -r. */ - if ((intptr_t)ip6 (sizeof(long)-1)) { + if ((intptr_t)ip6 (sizeof(u_int32_t)-1)) { static u_char *abuf = NULL; static int didwarn = 0; int clen = snapend - bp; if (clen snaplen) clen = snaplen;
Re: operations on nd_prefix list must take rdomain into account
Ping. On Fri, Nov 28, 2014 at 13:40 +0100, Mike Belopuhov wrote: Still looking for OK's. On Wed, Nov 26, 2014 at 18:24 +0100, Mike Belopuhov wrote: More rdomain checks are needed to be able to use the same subnet in a back to back connection between IPv6 rdomains as pointed out by mpi@. OK? diff --git sys/netinet6/nd6.c sys/netinet6/nd6.c index 9616187..d704cd6 100644 --- sys/netinet6/nd6.c +++ sys/netinet6/nd6.c @@ -1264,10 +1264,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp) s = splsoftnet(); /* First purge the addresses referenced by a prefix. */ LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) { struct in6_ifaddr *ia6, *ia6_next; + if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr)) continue; /* XXX */ /* do we really have to remove addresses as well? */ TAILQ_FOREACH_SAFE(ia6, in6_ifaddr, ia_list, ia6_next) { @@ -1282,10 +1285,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp) * Purging the addresses might remove the prefix as well. * So run the loop again to access only prefixes that have * not been freed already. */ LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) { + if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr)) continue; /* XXX */ prelist_remove(pr); } diff --git sys/netinet6/nd6_rtr.c sys/netinet6/nd6_rtr.c index bfc9c9f..e46b3b4 100644 --- sys/netinet6/nd6_rtr.c +++ sys/netinet6/nd6_rtr.c @@ -1690,10 +1690,13 @@ nd6_prefix_onlink(struct nd_prefix *pr) * interface, and the prefix has already installed the interface route. * Although such a configuration is expected to be rare, we explicitly * allow it. */ LIST_FOREACH(opr, nd_prefix, ndpr_entry) { + if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (opr == pr) continue; if ((opr-ndpr_stateflags NDPRF_ONLINK) == 0) continue; @@ -1826,10 +1829,13 @@ nd6_prefix_offlink(struct nd_prefix *pr) * the interface route (see comments in nd6_prefix_onlink). * If there's one, try to make the prefix on-link on the * interface. */ LIST_FOREACH(opr, nd_prefix, ndpr_entry) { + if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (opr == pr) continue; if ((opr-ndpr_stateflags NDPRF_ONLINK) != 0) continue;
Re: ix(4) and Unsupported SFP+ Module
On Fri, Dec 05, 2014 at 11:24 +0100, Gabriel Linder wrote: Hi, On a 5.6-release I have an ix card which refuses to work with unsupported SFP+ modules, saying this : ix0 at pci1 dev 0 function 0 Intel 82599 rev 0x01Unsupported SFP+ Module. However, this seems to be an artificial limitation of the ix driver : the following diff (against sys.tar.gz as of 5.6-release) allows my SFP+ and it works fine so far. The patch was made in a hurry, so I probably have cut more than I should... Is there a simpler/better way to fix this ? I think this is roughly what I have recently committed to -current. But hey, thanks for taking your time for looking into this!
tcpdump follows incorrect alignment requirement rules
bpf aligns data following the datalink header (e.g. ethernet) on the BPF_ALIGNMENT boundary. Since rev1.41 of bpf.h it's uint32_t instead of a long. And also since then almost all packets become unaligned from the tcpdump perspective and require costly copies into the internal buffer. Neither IP header (struct ip) nor IPv6 (struct ip6_hdr) have fields larger than 32 bits and therefore alignment requirements for them are at most 32 bit. I've successfully tested this diff on sparc64 and amd64. OK? diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c index 1839869..60946a1 100644 --- usr.sbin/tcpdump/print-ip.c +++ usr.sbin/tcpdump/print-ip.c @@ -368,11 +368,11 @@ ip_print(register const u_char *bp, register u_int length) /* * If the IP header is not aligned, copy into abuf. * This will never happen with BPF. It does happen with raw packet * dumps from -r. */ - if ((intptr_t)ip (sizeof(long)-1)) { + if ((intptr_t)ip (sizeof(u_int32_t)-1)) { static u_char *abuf = NULL; static int didwarn = 0; int clen = snapend - bp; if (clen snaplen) diff --git usr.sbin/tcpdump/print-ip6.c usr.sbin/tcpdump/print-ip6.c index cca8495..ea3a9b3 100644 --- usr.sbin/tcpdump/print-ip6.c +++ usr.sbin/tcpdump/print-ip6.c @@ -70,11 +70,11 @@ ip6_print(register const u_char *bp, register u_int length) /* * The IP header is not word aligned, so copy into abuf. * This will never happen with BPF. It does happen with * raw packet dumps from -r. */ - if ((intptr_t)ip6 (sizeof(long)-1)) { + if ((intptr_t)ip6 (sizeof(u_int32_t)-1)) { static u_char *abuf = NULL; static int didwarn = 0; int clen = snapend - bp; if (clen snaplen) clen = snaplen;
Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned
Still looking for OK's. On Wed, Nov 26, 2014 at 17:18 +0100, Mike Belopuhov wrote: better diff. the problem is that dissectors use packetp and snapend pointers themselves therefore they should be pointing to the newly allocated structure. we can restore them once we're done with the inner content and go back to the caller to see if we need to hexdump the contents. i'll see if i can cook and test the ipv6 version. OK? now with an ip6 version and i've made sure that this fixes dumping unaligned ipv6 packets as well. in the meantime jsg@ has lured me into looking at the afl crash in the same code and it looks like the check from ip6_print is useful here: if we haven't got enough data for a header, don't bother with anything else and just bail. ok? diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c index 3f4194c..e9d2185 100644 --- usr.sbin/tcpdump/print-ip.c +++ usr.sbin/tcpdump/print-ip.c @@ -351,22 +351,27 @@ in_cksum(const u_short *addr, register int len, int csum) * print an IP datagram. */ void ip_print(register const u_char *bp, register u_int length) { + static u_char *abuf = NULL; register const struct ip *ip; register u_int hlen, len, off; register const u_char *cp; + const u_char *pktp = packetp; + const u_char *send = snapend; ip = (const struct ip *)bp; + if ((u_char *)(ip + 1) snapend) { + printf([|ip]); + return; + } + /* * If the IP header is not aligned, copy into abuf. - * This will never happen with BPF. It does happen with raw packet - * dumps from -r. */ if ((intptr_t)ip (sizeof(long)-1)) { - static u_char *abuf = NULL; static int didwarn = 0; int clen = snapend - bp; if (clen snaplen) clen = snaplen; @@ -387,11 +392,11 @@ ip_print(register const u_char *bp, register u_int length) } TCHECK(*ip); if (ip-ip_v != IPVERSION) { (void)printf(bad-ip-version %u, ip-ip_v); - return; + goto out; } len = ntohs(ip-ip_len); if (length len) { (void)printf(truncated-ip - %d bytes missing!, @@ -400,11 +405,11 @@ ip_print(register const u_char *bp, register u_int length) } hlen = ip-ip_hl * 4; if (hlen sizeof(struct ip) || hlen len) { (void)printf(bad-hlen %d, hlen); - return; + goto out; } len -= hlen; /* @@ -465,11 +470,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_src), ipaddr_string(ip-ip_dst)); ip_print(cp, len); if (! vflag) { printf( (encap)); - return; + goto out; } break; #ifdef INET6 #ifndef IPPROTO_IPV6 @@ -482,11 +487,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_src), ipaddr_string(ip-ip_dst)); ip6_print(cp, len); if (! vflag) { printf( (encap)); - return; + goto out; } break; #endif /*INET6*/ #ifndef IPPROTO_GRE @@ -499,11 +504,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_dst)); /* do it */ gre_print(cp, len); if (! vflag) { printf( (gre encap)); - return; + goto out; } break; #ifndef IPPROTO_ESP #define IPPROTO_ESP 50 @@ -528,11 +533,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_src), ipaddr_string(ip-ip_dst)); mobile_print(cp, len); if (! vflag) { printf( (mobile encap)); - return; + goto out; } break; #ifndef IPPROTO_ETHERIP #define IPPROTO_ETHERIP 97 @@ -653,10 +658,13 @@ ip_print(register const u_char *bp, register u_int length) (void)printf(%soptlen=%d, sep, hlen); ip_optprint((u_char *)(ip + 1), hlen
Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned
On 27 November 2014 at 03:12, Theo de Raadt dera...@cvs.openbsd.org wrote: On Tue, Nov 25, 2014 at 18:42 +0100, Mike Belopuhov wrote: On Mon, Nov 24, 2014 at 19:04 +0100, Mike Belopuhov wrote: Hi, IP header is not always aligned since bpf copies out the mbuf chain into the contigous buffer provided by the userland. I've seen this with large packet sizes on VLANs. ip_print will then copy the packet but the Ethernet header into the internal buffer so that it can cast it to the IP header structure and update global packetp and snapend pointers hence preventing the -Xx dumping code from printing out the Ethernet header itself. Diff below fixes it. OK? better diff. the problem is that dissectors use packetp and snapend pointers themselves therefore they should be pointing to the newly allocated structure. we can restore them once we're done with the inner content and go back to the caller to see if we need to hexdump the contents. i'll see if i can cook and test the ipv6 version. OK? now with an ip6 version and i've made sure that this fixes dumping unaligned ipv6 packets as well. in the meantime jsg@ has lured me into looking at the afl crash in the same code and it looks like the check from ip6_print is useful here: if we haven't got enough data for a header, don't bother with anything else and just bail. ok? Did you test on a strict alignment machine? works fine on sparc64.
Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned
On Tue, Nov 25, 2014 at 18:42 +0100, Mike Belopuhov wrote: On Mon, Nov 24, 2014 at 19:04 +0100, Mike Belopuhov wrote: Hi, IP header is not always aligned since bpf copies out the mbuf chain into the contigous buffer provided by the userland. I've seen this with large packet sizes on VLANs. ip_print will then copy the packet but the Ethernet header into the internal buffer so that it can cast it to the IP header structure and update global packetp and snapend pointers hence preventing the -Xx dumping code from printing out the Ethernet header itself. Diff below fixes it. OK? better diff. the problem is that dissectors use packetp and snapend pointers themselves therefore they should be pointing to the newly allocated structure. we can restore them once we're done with the inner content and go back to the caller to see if we need to hexdump the contents. i'll see if i can cook and test the ipv6 version. OK? now with an ip6 version and i've made sure that this fixes dumping unaligned ipv6 packets as well. in the meantime jsg@ has lured me into looking at the afl crash in the same code and it looks like the check from ip6_print is useful here: if we haven't got enough data for a header, don't bother with anything else and just bail. ok? diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c index 3f4194c..e9d2185 100644 --- usr.sbin/tcpdump/print-ip.c +++ usr.sbin/tcpdump/print-ip.c @@ -351,22 +351,27 @@ in_cksum(const u_short *addr, register int len, int csum) * print an IP datagram. */ void ip_print(register const u_char *bp, register u_int length) { + static u_char *abuf = NULL; register const struct ip *ip; register u_int hlen, len, off; register const u_char *cp; + const u_char *pktp = packetp; + const u_char *send = snapend; ip = (const struct ip *)bp; + if ((u_char *)(ip + 1) snapend) { + printf([|ip]); + return; + } + /* * If the IP header is not aligned, copy into abuf. -* This will never happen with BPF. It does happen with raw packet -* dumps from -r. */ if ((intptr_t)ip (sizeof(long)-1)) { - static u_char *abuf = NULL; static int didwarn = 0; int clen = snapend - bp; if (clen snaplen) clen = snaplen; @@ -387,11 +392,11 @@ ip_print(register const u_char *bp, register u_int length) } TCHECK(*ip); if (ip-ip_v != IPVERSION) { (void)printf(bad-ip-version %u, ip-ip_v); - return; + goto out; } len = ntohs(ip-ip_len); if (length len) { (void)printf(truncated-ip - %d bytes missing!, @@ -400,11 +405,11 @@ ip_print(register const u_char *bp, register u_int length) } hlen = ip-ip_hl * 4; if (hlen sizeof(struct ip) || hlen len) { (void)printf(bad-hlen %d, hlen); - return; + goto out; } len -= hlen; /* @@ -465,11 +470,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_src), ipaddr_string(ip-ip_dst)); ip_print(cp, len); if (! vflag) { printf( (encap)); - return; + goto out; } break; #ifdef INET6 #ifndef IPPROTO_IPV6 @@ -482,11 +487,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_src), ipaddr_string(ip-ip_dst)); ip6_print(cp, len); if (! vflag) { printf( (encap)); - return; + goto out; } break; #endif /*INET6*/ #ifndef IPPROTO_GRE @@ -499,11 +504,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_dst)); /* do it */ gre_print(cp, len); if (! vflag) { printf( (gre encap)); - return; + goto out; } break; #ifndef IPPROTO_ESP #define IPPROTO_ESP 50 @@ -528,11 +533,11 @@ ip_print(register const u_char *bp, register u_int length) ipaddr_string(ip-ip_src), ipaddr_string(ip-ip_dst
operations on nd_prefix list must take rdomain into account
More rdomain checks are needed to be able to use the same subnet in a back to back connection between IPv6 rdomains as pointed out by mpi@. OK? diff --git sys/netinet6/nd6.c sys/netinet6/nd6.c index 9616187..d704cd6 100644 --- sys/netinet6/nd6.c +++ sys/netinet6/nd6.c @@ -1264,10 +1264,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp) s = splsoftnet(); /* First purge the addresses referenced by a prefix. */ LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) { struct in6_ifaddr *ia6, *ia6_next; + if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr)) continue; /* XXX */ /* do we really have to remove addresses as well? */ TAILQ_FOREACH_SAFE(ia6, in6_ifaddr, ia_list, ia6_next) { @@ -1282,10 +1285,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp) * Purging the addresses might remove the prefix as well. * So run the loop again to access only prefixes that have * not been freed already. */ LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) { + if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr)) continue; /* XXX */ prelist_remove(pr); } diff --git sys/netinet6/nd6_rtr.c sys/netinet6/nd6_rtr.c index bfc9c9f..e46b3b4 100644 --- sys/netinet6/nd6_rtr.c +++ sys/netinet6/nd6_rtr.c @@ -1690,10 +1690,13 @@ nd6_prefix_onlink(struct nd_prefix *pr) * interface, and the prefix has already installed the interface route. * Although such a configuration is expected to be rare, we explicitly * allow it. */ LIST_FOREACH(opr, nd_prefix, ndpr_entry) { + if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (opr == pr) continue; if ((opr-ndpr_stateflags NDPRF_ONLINK) == 0) continue; @@ -1826,10 +1829,13 @@ nd6_prefix_offlink(struct nd_prefix *pr) * the interface route (see comments in nd6_prefix_onlink). * If there's one, try to make the prefix on-link on the * interface. */ LIST_FOREACH(opr, nd_prefix, ndpr_entry) { + if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain) + continue; + if (opr == pr) continue; if ((opr-ndpr_stateflags NDPRF_ONLINK) != 0) continue;
Re: iked control process crash at startup
On 25 November 2014 at 13:11, Vincent Gross dermi...@kilob.yt wrote: Hi tech@, I've been using iked for some weeks to tunnel my laptop to home over 3G. Sunday I upgraded my laptop to the latest snapshot; previous upgrade was about 2 or 3 weeks ago. When I started iked, it crashed randomly, as in one time it runs just fine and completes the handshake, the other it crashes before even sending the first packet. I ran ktrace -di /sbin/iked and kdump'd the resulting file. Of the 5 processes, 4 finished by calling exit(0), one was terminated on a SIGSEGV. As it is also the only one that do stuff on /var/run/iked.sock, it is the control process. I repeated the above ktrace 4 times and got consistent results: SIGSEGV'd control process. I'll keep the hunt going, but I am not sure how long this will take nor how much time I'll have to spare, so here is the control process kdump. Cheers, -- Vincent can you please provide iked -dvv output when it crashes and a trace from the core file. to generate a core file compile iked like so: # cd /usr/src/sbin/iked # make clean # make obj # make DEBUG=-g setup core file generation: # mkdir /var/crash/iked # chmod 700 /var/crash/iked # sysctl kern.nosuidcoredump=3 run /usr/src/sbin/iked/obj/iked -dvv and once it crashes retrieve the full backtrace from the core file: # cd /usr/src/sbin/iked # gdb -c /var/crash/iked/4970.core ./obj/iked (gdb) bt full and send this along with the iked output. please make sure that you pick the correct core file so that it corresponds to the debug output.
Re: Simplify in_broadcast()
On 20 November 2014 at 15:24, Martin Pieuchot mpieuc...@nolizard.org wrote: Diff below make the function always iterate on all the interfaces. After that I'd like to change ifa_ifwithaddr() to only match unicast addresses and use in_broadcast() in the few places where we also accept broadcast addresses. This would prevent people from matching a broadcast address when they don't want to. Ok? OK
Re: Rename rt_ifa_addloop(9)
On 20 November 2014 at 15:37, Martin Pieuchot mpieuc...@nolizard.org wrote: When I decided to use in6_ifaddloop() for IPv4 I barely though about the name of the function. Recently mikeb@ told me that the name is confusing, especially because I'm trying to turn the loopback hack into local routes. So here's a diff to rename these functions and to make them return the error code of rtrequest9(9), ignored for the moment. Ok? looks good to me.
Re: Make every interface with a watchdog register it's own timeout
On Sun, Nov 23, 2014 at 13:39 +0100, Claudio Jeker wrote: On Sun, Nov 23, 2014 at 02:10:24AM +0100, Mike Belopuhov wrote: Hi, This removes the system wide if_slowtimo timeout and lets every interface with a valid if_watchdog method register it's own. The rational is to get rid of the ifnet loop in the softclock context to avoid further complications with concurrent access to the ifnet list. This might also save some cpu cycles in the runtime if number of interfaces is large while only a fraction all of them have a watchdog routine. This has originally come up on NetBSD mailing lists with some rather complicated locking strategy. Masao Uebayashi has later proposed a similar solution. The diff was looked at and corrected by mpi@. I've tested the trunk and a system build as well. OK? Please do not use M_TEMP for this (this is not a temporary buffer). There's nothing wrong with M_TEMP usage here. In fact all the other allocations are done with M_TEMP in if.c. Maybe even remove the need to allocate the struct timeout (which requires if_var.h to include sys/timeout.h and adds 24 / 40 bytes to ifnet even when there is no watchdog used (like on vlan)). That would be nice, however, I don't think right now it matters too much. Some userland apps are still using kvm to get their hands on the ifnet structure and I'm not 100% positive we should expose sys/timeout.h via if_var.h or teach them to include it on it's own. Historically it seems that if.h was overused in userland hence there are plenty of other things (list heads for example) that get allocated while they clearly should be part of the ifnet. A thorough clean up is long due. In any case I think M_DEVBUF is the right type. We could also rename if_slowtimo to something more concrete like if_watchdog_timeout but that's just a bikeshed. We can, but as you said it's bikeshedding. I'm not saving the world here (c) Theo.
Re: Make every interface with a watchdog register it's own timeout
On Sun, Nov 23, 2014 at 12:06 +0100, Martin Pieuchot wrote: On 23/11/14(Sun) 02:10, Mike Belopuhov wrote: Hi, This removes the system wide if_slowtimo timeout and lets every interface with a valid if_watchdog method register it's own. The rational is to get rid of the ifnet loop in the softclock context to avoid further complications with concurrent access to the ifnet list. This might also save some cpu cycles in the runtime if number of interfaces is large while only a fraction all of them have a watchdog routine. This has originally come up on NetBSD mailing lists with some rather complicated locking strategy. Masao Uebayashi has later proposed a similar solution. The diff was looked at and corrected by mpi@. I've tested the trunk and a system build as well. OK? I'm basically ok, some remarks below. What about putting the allocation inside if_attach_common() with the hooks that are allocated for the same reason? Sure. Maybe we should explain why we allocate them? We all know why, don't think it will help anything. New version. OK? diff --git sys/net/if.c sys/net/if.c index 0632eec..679c8c8 100644 --- sys/net/if.c +++ sys/net/if.c @@ -133,11 +133,10 @@ void if_attachdomain1(struct ifnet *); void if_attach_common(struct ifnet *); void if_detach_queues(struct ifnet *, struct ifqueue *); void if_detached_start(struct ifnet *); intif_detached_ioctl(struct ifnet *, u_long, caddr_t); -void if_detached_watchdog(struct ifnet *); intif_getgroup(caddr_t, struct ifnet *); intif_getgroupmembers(caddr_t); intif_getgroupattribs(caddr_t); intif_setgroupattribs(caddr_t); @@ -169,16 +168,12 @@ int net_livelocked(void); * parameters. */ void ifinit() { - static struct timeout if_slowtim; - - timeout_set(if_slowtim, if_slowtimo, if_slowtim); timeout_set(net_tick_to, net_tick, net_tick_to); - if_slowtimo(if_slowtim); net_tick(net_tick_to); } static unsigned int if_index = 0; static unsigned int if_indexlim = 0; @@ -270,10 +265,13 @@ if_attachsetup(struct ifnet *ifp) if_attachdomain1(ifp); #if NPF 0 pfi_attach_ifnet(ifp); #endif + timeout_set(ifp-if_slowtimo, if_slowtimo, ifp); + if_slowtimo(ifp); + /* Announce the interface. */ rt_ifannouncemsg(ifp, IFAN_ARRIVAL); } /* @@ -426,10 +424,13 @@ if_attach_common(struct ifnet *ifp) M_TEMP, M_WAITOK); TAILQ_INIT(ifp-if_linkstatehooks); ifp-if_detachhooks = malloc(sizeof(*ifp-if_detachhooks), M_TEMP, M_WAITOK); TAILQ_INIT(ifp-if_detachhooks); + + ifp-if_slowtimo = malloc(sizeof(*ifp-if_slowtimo), M_TEMP, + M_WAITOK|M_ZERO); } void if_start(struct ifnet *ifp) { @@ -481,19 +482,22 @@ if_detach(struct ifnet *ifp) struct domain *dp; ifp-if_flags = ~IFF_OACTIVE; ifp-if_start = if_detached_start; ifp-if_ioctl = if_detached_ioctl; - ifp-if_watchdog = if_detached_watchdog; + ifp-if_watchdog = NULL; /* * Call detach hooks from head to tail. To make sure detach * hooks are executed in the reverse order they were added, all * the hooks have to be added to the head! */ dohooks(ifp-if_detachhooks, HOOK_REMOVE | HOOK_FREE); + /* Remove the watchdog timeout */ + timeout_del(ifp-if_slowtimo); + #if NBRIDGE 0 /* Remove the interface from any bridge it is part of. */ if (ifp-if_bridgeport) bridge_ifdetach(ifp); #endif @@ -577,10 +581,12 @@ do { \ free(ifp-if_addrhooks, M_TEMP, 0); free(ifp-if_linkstatehooks, M_TEMP, 0); free(ifp-if_detachhooks, M_TEMP, 0); + free(ifp-if_slowtimo, M_TEMP, sizeof(*ifp-if_slowtimo)); + for (dp = domains; dp; dp = dp-dom_next) { if (dp-dom_ifdetach ifp-if_afdata[dp-dom_family]) (*dp-dom_ifdetach)(ifp, ifp-if_afdata[dp-dom_family]); } @@ -1160,29 +1166,26 @@ if_link_state_change_task(void *arg, void *unused) } splx(s); } /* - * Handle interface watchdog timer routines. Called - * from softclock, we decrement timers (if set) and + * Handle interface watchdog timer routine. Called + * from softclock, we decrement timer (if set) and * call the appropriate interface routine on expiration. */ void if_slowtimo(void *arg) { - struct timeout *to = (struct timeout *)arg; - struct ifnet *ifp; + struct ifnet *ifp = arg; int s = splnet(); - TAILQ_FOREACH(ifp, ifnet, if_list) { - if (ifp-if_timer == 0 || --ifp-if_timer) - continue; - if (ifp-if_watchdog) + if (ifp-if_watchdog) { + if (ifp-if_timer 0 --ifp-if_timer == 0) (*ifp-if_watchdog)(ifp); + timeout_add(ifp
Trimming tcpdump a bit
Hi, I've been trying to fix a bug in tcpdump but the rottenness of the current code base with it's horrendous APIs is just getting in the way. What if we trimmed it a bit, say killed all those pesky 'register' values, kill protocols that we cannot really test (appletalk, fddi, etc.), kill dissectors that are barely implemented (krb for instance). Would anyone be against this? Does anyone have any plans to sync it up with the upstream? Here's a 91K diff to deregister tcpdump. http://www.vantronix.net/~mike/tcpdump-deregister.diff OK?
Re: Trimming tcpdump a bit
On 24 November 2014 at 17:20, Mike Belopuhov m...@belopuhov.com wrote: On 24 November 2014 at 16:42, Mike Belopuhov m...@belopuhov.com wrote: Hi, I've been trying to fix a bug in tcpdump but the rottenness of the current code base with it's horrendous APIs is just getting in the way. What if we trimmed it a bit, say killed all those pesky 'register' values, kill protocols that we cannot really test (appletalk, fddi, etc.), kill dissectors that are barely implemented (krb for instance). Would anyone be against this? Does anyone have any plans to sync it up with the upstream? Here's a 91K diff to deregister tcpdump. http://www.vantronix.net/~mike/tcpdump-deregister.diff OK? http://www.vantronix.net/~mike/tcpdump-atm.diff http://www.vantronix.net/~mike/tcpdump-fddi.diff http://www.vantronix.net/~mike/tcpdump-slip.diff SLIP diff has an added benefit that /usr/include/net/slip.h will no longer be needed. AppleTalk? http://www.vantronix.net/~mike/tcpdump-atalk.diff DECnet? http://www.vantronix.net/~mike/tcpdump-decnet.diff
Re: Trimming tcpdump a bit
On 24 November 2014 at 16:42, Mike Belopuhov m...@belopuhov.com wrote: Hi, I've been trying to fix a bug in tcpdump but the rottenness of the current code base with it's horrendous APIs is just getting in the way. What if we trimmed it a bit, say killed all those pesky 'register' values, kill protocols that we cannot really test (appletalk, fddi, etc.), kill dissectors that are barely implemented (krb for instance). Would anyone be against this? Does anyone have any plans to sync it up with the upstream? Here's a 91K diff to deregister tcpdump. http://www.vantronix.net/~mike/tcpdump-deregister.diff OK? Apparently the answer is no. I'm withdrawing all diffs.
tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned
Hi, IP header is not always aligned since bpf copies out the mbuf chain into the contigous buffer provided by the userland. I've seen this with large packet sizes on VLANs. ip_print will then copy the packet but the Ethernet header into the internal buffer so that it can cast it to the IP header structure and update global packetp and snapend pointers hence preventing the -Xx dumping code from printing out the Ethernet header itself. Diff below fixes it. OK? diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c index 3f4194c..6d90b3d 100644 --- usr.sbin/tcpdump/print-ip.c +++ usr.sbin/tcpdump/print-ip.c @@ -252,7 +252,7 @@ ip_printroute(const char *type, register const u_char *cp, u_int length) * print IP options. */ static void -ip_optprint(register const u_char *cp, u_int length) +ip_optprint(register const u_char *cp, u_char *send, u_int length) { register u_int len; int tt; @@ -265,7 +265,7 @@ ip_optprint(register const u_char *cp, u_int length) printf([|ip op len %d], len); return; } - if (cp[1] = snapend || cp + len snapend) { + if (cp[1] = send || cp + len send) { printf([|ip]); return; } @@ -356,12 +356,11 @@ ip_print(register const u_char *bp, register u_int length) register const struct ip *ip; register u_int hlen, len, off; register const u_char *cp; + u_char *send = snapend; ip = (const struct ip *)bp; /* * If the IP header is not aligned, copy into abuf. -* This will never happen with BPF. It does happen with raw packet -* dumps from -r. */ if ((intptr_t)ip (sizeof(long)-1)) { static u_char *abuf = NULL; @@ -376,8 +375,7 @@ ip_print(register const u_char *bp, register u_int length) error(ip_print: malloc); } memmove((char *)abuf, (char *)ip, min(length, clen)); - snapend = abuf + clen; - packetp = abuf; + send = abuf + clen; ip = (struct ip *)abuf; /* We really want libpcap to give us aligned packets */ if (!didwarn) { @@ -538,7 +536,7 @@ ip_print(register const u_char *bp, register u_int length) #define IPPROTO_ETHERIP97 #endif case IPPROTO_ETHERIP: - etherip_print(cp, snapend - cp, len, + etherip_print(cp, send - cp, len, (const u_char *)ip); break; @@ -573,7 +571,7 @@ ip_print(register const u_char *bp, register u_int length) #endif case IPPROTO_PFSYNC: pfsync_ip_print(cp, - (int)(snapend - (u_char *)ip) - hlen, + (int)(send - (u_char *)ip) - hlen, (const u_char *)ip); break; @@ -638,7 +636,7 @@ ip_print(register const u_char *bp, register u_int length) } (void)printf(%slen %u, sep, ntohs(ip-ip_len)); sep = , ; - if ((u_char *)ip + hlen = snapend) { + if ((u_char *)ip + hlen = send) { u_int16_t sum, ip_sum; sum = in_cksum((const u_short *)ip, hlen, 0); if (sum != 0) { @@ -651,7 +649,7 @@ ip_print(register const u_char *bp, register u_int length) if (hlen sizeof(struct ip)) { hlen -= sizeof(struct ip); (void)printf(%soptlen=%d, sep, hlen); - ip_optprint((u_char *)(ip + 1), hlen); + ip_optprint((u_char *)(ip + 1), send, hlen); } printf()); }
Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned
On Nov 24, 2014 7:10 PM, Mike Belopuhov m...@belopuhov.com wrote: Hi, IP header is not always aligned since bpf copies out the mbuf chain into the contigous buffer provided by the userland. I've seen this with large packet sizes on VLANs. ip_print will then copy the packet but the Ethernet header into the internal buffer so that it can cast it to the IP header structure and update global packetp and snapend pointers hence preventing the -Xx dumping code from printing out the Ethernet header itself. Diff below fixes it. OK? Err, I think I've sent the wrong diff. Please ignore it. I'll resend it tomorrow.
Make every interface with a watchdog register it's own timeout
Hi, This removes the system wide if_slowtimo timeout and lets every interface with a valid if_watchdog method register it's own. The rational is to get rid of the ifnet loop in the softclock context to avoid further complications with concurrent access to the ifnet list. This might also save some cpu cycles in the runtime if number of interfaces is large while only a fraction all of them have a watchdog routine. This has originally come up on NetBSD mailing lists with some rather complicated locking strategy. Masao Uebayashi has later proposed a similar solution. The diff was looked at and corrected by mpi@. I've tested the trunk and a system build as well. OK? Index: sys/net/if.c === RCS file: /home/cvs/src/sys/net/if.c,v retrieving revision 1.303 diff -u -p -r1.303 if.c --- sys/net/if.c3 Nov 2014 11:02:08 - 1.303 +++ sys/net/if.c22 Nov 2014 16:25:34 - @@ -135,7 +135,6 @@ voidif_attach_common(struct ifnet *); void if_detach_queues(struct ifnet *, struct ifqueue *); void if_detached_start(struct ifnet *); intif_detached_ioctl(struct ifnet *, u_long, caddr_t); -void if_detached_watchdog(struct ifnet *); intif_getgroup(caddr_t, struct ifnet *); intif_getgroupmembers(caddr_t); @@ -171,12 +170,8 @@ intnet_livelocked(void); void ifinit() { - static struct timeout if_slowtim; - - timeout_set(if_slowtim, if_slowtimo, if_slowtim); timeout_set(net_tick_to, net_tick, net_tick_to); - if_slowtimo(if_slowtim); net_tick(net_tick_to); } @@ -272,6 +267,11 @@ if_attachsetup(struct ifnet *ifp) pfi_attach_ifnet(ifp); #endif + ifp-if_wdtimo = malloc(sizeof(*ifp-if_wdtimo), M_TEMP, + M_WAITOK|M_ZERO); + timeout_set(ifp-if_wdtimo, if_slowtimo, ifp); + if_slowtimo(ifp); + /* Announce the interface. */ rt_ifannouncemsg(ifp, IFAN_ARRIVAL); } @@ -483,7 +483,7 @@ if_detach(struct ifnet *ifp) ifp-if_flags = ~IFF_OACTIVE; ifp-if_start = if_detached_start; ifp-if_ioctl = if_detached_ioctl; - ifp-if_watchdog = if_detached_watchdog; + ifp-if_watchdog = NULL; /* * Call detach hooks from head to tail. To make sure detach @@ -492,6 +492,10 @@ if_detach(struct ifnet *ifp) */ dohooks(ifp-if_detachhooks, HOOK_REMOVE | HOOK_FREE); + /* Remove the watchdog timeout */ + timeout_del(ifp-if_wdtimo); + free(ifp-if_wdtimo, M_TEMP, sizeof(*ifp-if_wdtimo)); + #if NBRIDGE 0 /* Remove the interface from any bridge it is part of. */ if (ifp-if_bridgeport) @@ -1162,25 +1166,22 @@ if_link_state_change_task(void *arg, voi } /* - * Handle interface watchdog timer routines. Called - * from softclock, we decrement timers (if set) and + * Handle interface watchdog timer routine. Called + * from softclock, we decrement timer (if set) and * call the appropriate interface routine on expiration. */ void if_slowtimo(void *arg) { - struct timeout *to = (struct timeout *)arg; - struct ifnet *ifp; + struct ifnet *ifp = arg; int s = splnet(); - TAILQ_FOREACH(ifp, ifnet, if_list) { - if (ifp-if_timer == 0 || --ifp-if_timer) - continue; - if (ifp-if_watchdog) + if (ifp-if_watchdog) { + if (ifp-if_timer 0 --ifp-if_timer == 0) (*ifp-if_watchdog)(ifp); + timeout_add(ifp-if_wdtimo, hz / IFNET_SLOWHZ); } splx(s); - timeout_add(to, hz / IFNET_SLOWHZ); } /* @@ -1824,12 +1825,6 @@ int if_detached_ioctl(struct ifnet *ifp, u_long a, caddr_t b) { return ENODEV; -} - -void -if_detached_watchdog(struct ifnet *ifp) -{ - /* nothing */ } /* Index: sys/net/if_trunk.c === RCS file: /home/cvs/src/sys/net/if_trunk.c,v retrieving revision 1.91 diff -u -p -r1.91 if_trunk.c --- sys/net/if_trunk.c 18 Nov 2014 02:37:31 - 1.91 +++ sys/net/if_trunk.c 22 Nov 2014 16:25:34 - @@ -344,11 +344,13 @@ trunk_port_create(struct trunk_softc *tr tp-tp_iftype = ifp-if_type; ifp-if_type = IFT_IEEE8023ADLAG; ifp-if_tp = (caddr_t)tp; - tp-tp_watchdog = ifp-if_watchdog; - ifp-if_watchdog = trunk_port_watchdog; tp-tp_ioctl = ifp-if_ioctl; ifp-if_ioctl = trunk_port_ioctl; + timeout_del(ifp-if_wdtimo); + tp-tp_watchdog = ifp-if_watchdog; + ifp-if_watchdog = trunk_port_watchdog; + tp-tp_if = ifp; tp-tp_trunk = tr; @@ -427,8 +429,8 @@ trunk_port_destroy(struct trunk_port *tp /* Restore interface */ ifp-if_type = tp-tp_iftype; - ifp-if_watchdog = tp-tp_watchdog; ifp-if_ioctl = tp-tp_ioctl; + ifp-if_watchdog = tp-tp_watchdog; ifp-if_tp = NULL;
Re: Make every interface with a watchdog register it's own timeout
On Sun, Nov 23, 2014 at 02:10 +0100, Mike Belopuhov wrote: Hi, This removes the system wide if_slowtimo timeout and lets every interface with a valid if_watchdog method register it's own. The rational is to get rid of the ifnet loop in the softclock context to avoid further complications with concurrent access to the ifnet list. This might also save some cpu cycles in the runtime if number of interfaces is large while only a fraction all of them have a watchdog routine. This has originally come up on NetBSD mailing lists with some rather complicated locking strategy. Masao Uebayashi has later proposed a similar solution. The diff was looked at and corrected by mpi@. I've tested the trunk and a system build as well. OK? [snip] Index: sys/net/if_var.h === RCS file: /home/cvs/src/sys/net/if_var.h,v retrieving revision 1.12 diff -u -p -r1.12 if_var.h --- sys/net/if_var.h 8 Jul 2014 04:02:14 - 1.12 +++ sys/net/if_var.h 22 Nov 2014 16:25:34 - @@ -72,6 +72,7 @@ struct mbuf; struct proc; struct rtentry; struct socket; +struct timeout; struct ether_header; struct arpcom; struct rt_addrinfo; @@ -147,6 +148,7 @@ struct ifnet {/* and the entries */ charif_description[IFDESCRSIZE]; /* interface description */ u_short if_rtlabelid; /* next route label */ u_int8_t if_priority; + struct timeout *if_wdtimo; /* watchdog timeout */ /* procedure handles */ /* output routine (enqueue) */ Now that I think of it, perhaps I should have called it if_slowtimo like the function. Any preference?
remove useless includes in netstat
apparently these are not needed and just make my life harder. ok? diff --git usr.bin/netstat/if.c usr.bin/netstat/if.c index f722db23..b07860d 100644 --- usr.bin/netstat/if.c +++ usr.bin/netstat/if.c @@ -36,11 +36,10 @@ #include sys/protosw.h #include sys/socket.h #include sys/sysctl.h #include net/if.h -#include net/if_var.h #include net/if_dl.h #include net/if_types.h #include net/route.h #include netinet/in.h #include netinet/in_var.h diff --git usr.bin/netstat/mroute6.c usr.bin/netstat/mroute6.c index b198ea4..6b17f57 100644 --- usr.bin/netstat/mroute6.c +++ usr.bin/netstat/mroute6.c @@ -65,11 +65,10 @@ */ #include sys/param.h #include sys/queue.h #include sys/socket.h -#include sys/socketvar.h #include sys/protosw.h #include sys/sysctl.h #include net/if.h #include net/if_var.h diff --git usr.bin/netstat/net80211.c usr.bin/netstat/net80211.c index dc40c09..4162d59 100644 --- usr.bin/netstat/net80211.c +++ usr.bin/netstat/net80211.c @@ -21,11 +21,10 @@ #include sys/socket.h #include sys/file.h #include sys/ioctl.h #include net/if.h -#include net/if_var.h #include netinet/in.h #include netinet/if_ether.h #include net80211/ieee80211.h
Re: ping6 to Link Local disturbed by pf set skip?
On 18 November 2014 15:30, Martin Pieuchot mpieuc...@nolizard.org wrote: On 13/11/14(Thu) 16:41, Stuart Henderson wrote: This changes behaviour of ping6 ff02::1%pppoe0 for me, previously I saw a response to each icmp message in the sequence, now I just see the first response. I am not using set skip on that machine. $ ping6 ff02::1%pppoe0 PING6(56=40+8+8 bytes) fe80::225:90ff:fec0:77b4%pppoe0 -- ff02::1%pppoe0 16 bytes from fe80:15::225:90ff:fec0:77b4, icmp_seq=0 hlim=64 time=0.271 ms ^C --- ff02::1%pppoe0 ping6 statistics --- 6 packets transmitted, 1 packets received, 83.3% packet loss round-trip min/avg/max/std-dev = 0.271/0.271/0.271/0.000 ms Thanks for spotting this Stuart, diff below fixes this issue. It was another funky situation to debug. Problem was that 2 new states were created on loopback for the echo reply instead of matching the corresponding echo request states. But according to mikeb@ new states should never be created for replies. So the diff below changes the icmp multicast check in pf to take into consideration the scope of the sender, and in a more general way no longer clear the scopes of addresses for packets sent through loopback when calling pf_test(). To put it into the perspective: firewall policy wise we want to differentiate between traffic on one interface and traffic on the other interface. Since you can have exactly the same IPv6 link-local addresses on two interfaces that only differ in scope IDs we want to make sure that state matching code treats packets outgoing on these two interfaces as subjects to different policy restrictions and therefore can't match the same state. With this patch we ensure that IPv6 input and output routines provide scoped addresses to the pf and pf uses them to create states. Needless to say I'm OK with both diffs.
Re: VPLS patch [0/3]: introduction
On 14 November 2014 17:26, Rafael Zalamena rzalam...@gmail.com wrote: On Sun, Sep 14, 2014 at 11:48:11PM -0300, Rafael Zalamena wrote: The following mails will contain patchs that implement the VPLS datapath in OpenBSD. Applying all patchs should allow people to configure a network using VPLS manually. --- snipped diffs descriptions --- How to use: * Create a MPLS network. Example: http://2011.eurobsdcon.org/papers/jeker/MPLS.pdf * Create a pseudowire in both ends of your network (PEs) # ifconfig wirenumber encap ethernet wirelabel local label \ remote label neighbor other PE address controlword up # ifconfig wire0 encap ethernet wirelabel 500 500 neighbor 1.2.3.4 up or # ifconfig wire0 encap ethernet wirelabel 500 500 neighbor 1.2.3.4 \ controlword up * Create a bridge between the interface facing your customer (CE) and your wireX, also in both PEs you are configuring the VPN. --- more comments snipped --- TODO list: * interface configuration code - SIOCSETWIRECFG / SIOCGETWIRECFG (DONE) * add / remove wire label (DONE) * add / remove wire control label (DONE) * ethernet-vlan support (WIP) Ethernet-tagged support almost complete, it's not working in the case when you have packets coming with 2 or more tags. I'm having problems to test this since 5.6 and -current doesn't do QinQ properly. (I'll be sending proposal diffs to fix this soon) * ifconfig(8) integration ** show wire configuration (DONE) ** configure wire (DONE) * man page: ** wire(4) (TODO) ** ifconfig(8) (TODO) wire(4) is depending on some other diffs now that are unrelated to this, please see: (update mpe to use rt_ifa, wire will use that too) http://marc.info/?l=openbsd-techm=141280528700615w=2 (fix bridge + vlan, bridge expects the complete packet) http://marc.info/?l=openbsd-techm=141575896420071w=2 is it possible to call it something other than just wire(4)? vpls maybe?
Re: improving OpenBSD's gmac.c...
On 10 October 2014 02:39, Damien Miller d...@mindrot.org wrote: On Thu, 9 Oct 2014, Christian Weisgerber wrote: John-Mark Gurney: I also have an implementation of ghash that does a 4 bit lookup table version with the table split between cache lines in p4 at: https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.cREV=4 This also has a version with does 4 blocks at a time getting a further speed up... FWIW, I did a quick dirty merge of this into the OpenBSD tree and the speed of my test (net6501-50, tcpbench -u over esp aes-128-gmac) almost doubled. isn't this likely to make it more likely to be subject to timing attacks? then how is this different to our table based aes implementation? and it's the same C code as in openssl which also uses table based gcm implementation. what countermeasures can be applied to the table lookup code to fight these attacks?
Re: 5.6 Icmp6 checksum / pf
On Sun, Nov 09, 2014 at 10:17 +0100, Bastien Durel wrote: Hi, I recently upgraded to 5.6, and got problems with icmpv6 I have a gif tunnel for IPv6: [root@fremen root]# ifconfig gif0 gif0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1280 description: Sixxs priority: 0 groups: gif egress tunnel: inet 78.194.94.73 - 212.100.184.146 inet6 fe80::200:24ff:fecf:42ac%gif0 - prefixlen 64 scopeid 0x10 inet6 2001:6f8:202:19c::2 - 2001:6f8:202:19c::1 prefixlen 128 When I initiate a ping from this interface, it works as intended: 08:59:38.376107 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo request (id:392e seq:0) [icmp6 cksum ok] (len 16, hlim 64) 08:59:38.410385 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo reply (id:392e seq:0) [icmp6 cksum ok] (len 16, hlim 57) 08:59:39.389383 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo request (id:392e seq:1) [icmp6 cksum ok] (len 16, hlim 64) 08:59:39.421397 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo reply (id:392e seq:1) [icmp6 cksum ok] (len 16, hlim 57) But when a ping from outside reached it, the echo reply is sent with a bad (0) checksum, and the packet is dropped by te other side: 09:40:28.852102 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo request (id:6c10 seq:1) [icmp6 cksum ok] (len 64, hlim 57) 09:40:28.852251 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo reply (id:6c10 seq:1) [bad icmp6 cksum 0! - 2d93] (len 64, hlim 64) 09:40:29.860327 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo request (id:6c10 seq:2) [icmp6 cksum ok] (len 64, hlim 57) 09:40:29.860432 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo reply (id:6c10 seq:2) [bad icmp6 cksum 0! - 8a71] (len 64, hlim 64) It works correctly with this pf rule disabled: pass in on gif0 reply-to ( gif0 2001:6f8:202:19c::1 ) keep state What's the correct way to handle correct return-path ? Regards, -- Bastien Durel hi, looks like it's caused by the forgotten in6_proto_cksum_out call in the pf_route6 after possible pf_map_addr/PF_ACPY manipulations. bastien, can you please verify that the diff below fixes your issue? diff --git sys/net/pf.c sys/net/pf.c index 979f44d..4c934db 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -5777,20 +5777,22 @@ pf_route6(struct mbuf **m, struct pf_rule *r, int dir, struct ifnet *oifp, goto bad; else if (m0 == NULL) goto done; if (m0-m_len sizeof(struct ip6_hdr)) { DPFPRINTF(LOG_ERR, pf_route6: m0-m_len sizeof(struct ip6_hdr)); goto bad; } } + in6_proto_cksum_out(m0, ifp); + /* * If the packet is too large for the outgoing interface, * send back an icmp6 error. */ if (IN6_IS_SCOPE_EMBED(dst-sin6_addr)) dst-sin6_addr.s6_addr16[1] = htons(ifp-if_index); if ((u_long)m0-m_pkthdr.len = ifp-if_mtu) { nd6_output(ifp, ifp, m0, dst, NULL); } else { in6_ifstat_inc(ifp, ifs6_in_toobig);
Re: iked responds with esp over external ips.
On 4 November 2014 17:06, Martin Larsson martin.larss...@gmail.com wrote: Hello! I've setup a tunnel between OpenBSD 5.6 using iked and an openwrt router running strongswan. The tunnel works great with ping and other traffic but traffic between the two external ip's dies. This is a site-to-site connection and nothing fancy. iked.conf on OpenBSD. ikev2 esp from $10.11.12.0/24 to $194.168.4.0/24 peer $tcgw srcid sippan.se # ipsecctl -sa FLOWS: flow esp in from 192.168.4.0/24 to 10.11.12.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type use flow esp out from 10.11.12.0/24 to 192.168.4.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type require flow esp out from ::/0 to ::/0 type deny SAD: esp tunnel from 82.17.12.21 to 130.51.23.4 spi 0x67483925 auth hmac-sha1 enc aes esp tunnel from 130.51.23.4 to 82.17.12.21 spi 0xcf1f39d1 auth hmac-sha1 enc aes # netstat -nr Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface default130.51.23.4 UGS 10 30430256 - 8 em0 10/8 link#5 UC 10 - 4 vether0 10.11.12.13fe:e1:ba:d0:d6:1c UHLl 01 - 1 lo0 10.255.255.255 link#5 UHLc 3 570 - 4 vether0 82.17.12.21 130.51.23.4 UGHD 0 30430251 - L 56 em0 127/8 127.0.0.1 UGRS 00 32768 8 lo0 127.0.0.1 127.0.0.1 UH 16 32768 4 lo0 194.48.213.128/27 link#1 UC 10 - 4 em0 130.51.23.4 00:00:cd:19:95:16 UHLc 20 - 4 em0 130.51.23.4 00:02:b3:aa:cc:c3 HLl00 - 1 lo0 224/4 127.0.0.1 URS00 32768 8 lo0 Internet6: -removed, dont use it- Encap: Source Port DestinationPort Proto SA(Address/Proto/Type/Direction) 192.168.4/24 0 10.11.12/240 0 82.17.12.21/esp/use/in 10.11.12/240 192.168.4/24 0 0 82.17.12.21/esp/require/out default0 default 0 0 none/esp/deny/out # tcpdump on openbsd while trying to connect with ssh to the external ip of the OpenBSD host from the exernal ip of the other end. # tcpdump host 82.17.12.21 tcpdump: listening on em0, link-type EN10MB tcpdump: WARNING: compensating for unaligned libpcap packets 16:49:55.539903 egget.priv.lamest.se.54158 loller.sippan.se.ssh: S 2729317717:2729317717(0) win 8192 mss 1460,nop,wscale 8,nop,nop,sackOK (DF) 16:49:55.539932 loller.sippan.se.ssh egget.priv.lamest.se.54158: S 2317435827:2317435827(0) ack 2729317718 win 16384 mss 1240,nop,nop,sackOK,nop,wscale 3 16:49:55.545936 egget.priv.lamest.se.54158 loller.sippan.se.ssh: . ack 1 win 256 (DF) 16:49:55.553927 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 190 len 100 16:50:01.553883 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 191 len 100 16:50:05.977468 esp egget.priv.lamest.se loller.sippan.se spi 0x67483925 seq 127 len 84 (DF) 16:50:05.977519 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 192 len 84 # tcpdump on enc0 while trying ssh and https tcpdump: listening on enc0, link-type ENC tcpdump: WARNING: compensating for unaligned libpcap packets 17:01:01.578622 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.ssh egget.priv.lamest.se.54158: R 2317435850:2317435850(0) ack 2729317718 win 0 (encap) 17:01:05.786123 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.ssh egget.priv.lamest.se.54792: P 3813334764:3813334785(21) ack 2711749548 win 2170 (encap) 17:01:05.968654 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: P 3540908942:3540909100(158) ack 1840586787 win 2170 (encap) 17:01:06.265543 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) 17:01:06.876165 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) 17:01:08.095189 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) 17:01:10.459116 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) So it appears that OpenBSD tries to send back traffic with ESP when it shouldn't. I'd also like to add that the exact same setup works with with isakmpd. Best regards Martin This is a known issue. The reason why it happens is not strictly documented. The change to the stack was introduced as part of the cleanup long time ago. Supposedly it's there to support TCP MD5 signatures, but I didn't verify that yet. There's this code in the
Re: iked responds with esp over external ips.
On 5 November 2014 13:28, Mike Belopuhov m...@belopuhov.com wrote: On 4 November 2014 17:06, Martin Larsson martin.larss...@gmail.com wrote: Hello! I've setup a tunnel between OpenBSD 5.6 using iked and an openwrt router running strongswan. The tunnel works great with ping and other traffic but traffic between the two external ip's dies. This is a site-to-site connection and nothing fancy. iked.conf on OpenBSD. ikev2 esp from $10.11.12.0/24 to $194.168.4.0/24 peer $tcgw srcid sippan.se # ipsecctl -sa FLOWS: flow esp in from 192.168.4.0/24 to 10.11.12.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type use flow esp out from 10.11.12.0/24 to 192.168.4.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type require flow esp out from ::/0 to ::/0 type deny SAD: esp tunnel from 82.17.12.21 to 130.51.23.4 spi 0x67483925 auth hmac-sha1 enc aes esp tunnel from 130.51.23.4 to 82.17.12.21 spi 0xcf1f39d1 auth hmac-sha1 enc aes # netstat -nr Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface default130.51.23.4 UGS 10 30430256 - 8 em0 10/8 link#5 UC 10 - 4 vether0 10.11.12.13fe:e1:ba:d0:d6:1c UHLl 01 - 1 lo0 10.255.255.255 link#5 UHLc 3 570 - 4 vether0 82.17.12.21 130.51.23.4 UGHD 0 30430251 - L 56 em0 127/8 127.0.0.1 UGRS 00 32768 8 lo0 127.0.0.1 127.0.0.1 UH 16 32768 4 lo0 194.48.213.128/27 link#1 UC 10 - 4 em0 130.51.23.4 00:00:cd:19:95:16 UHLc 20 - 4 em0 130.51.23.4 00:02:b3:aa:cc:c3 HLl00 - 1 lo0 224/4 127.0.0.1 URS00 32768 8 lo0 Internet6: -removed, dont use it- Encap: Source Port DestinationPort Proto SA(Address/Proto/Type/Direction) 192.168.4/24 0 10.11.12/240 0 82.17.12.21/esp/use/in 10.11.12/240 192.168.4/24 0 0 82.17.12.21/esp/require/out default0 default 0 0 none/esp/deny/out # tcpdump on openbsd while trying to connect with ssh to the external ip of the OpenBSD host from the exernal ip of the other end. # tcpdump host 82.17.12.21 tcpdump: listening on em0, link-type EN10MB tcpdump: WARNING: compensating for unaligned libpcap packets 16:49:55.539903 egget.priv.lamest.se.54158 loller.sippan.se.ssh: S 2729317717:2729317717(0) win 8192 mss 1460,nop,wscale 8,nop,nop,sackOK (DF) 16:49:55.539932 loller.sippan.se.ssh egget.priv.lamest.se.54158: S 2317435827:2317435827(0) ack 2729317718 win 16384 mss 1240,nop,nop,sackOK,nop,wscale 3 16:49:55.545936 egget.priv.lamest.se.54158 loller.sippan.se.ssh: . ack 1 win 256 (DF) 16:49:55.553927 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 190 len 100 16:50:01.553883 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 191 len 100 16:50:05.977468 esp egget.priv.lamest.se loller.sippan.se spi 0x67483925 seq 127 len 84 (DF) 16:50:05.977519 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 192 len 84 There's another interesting behavior evident in this trace: first few (usually 1 or 2) packets (SYN-ACK from loller here) are not encrypted. That's because we update our PCB on input due to the presence of a matching SA in a very agile manner: first it doesn't really match, but we make sure next time it does (funny logic in the ipsp_spd_inp that coupled with the default IPsec policy).
Re: Multipath for HOST p2p routes
On 4 November 2014 12:51, Martin Pieuchot mpieuc...@nolizard.org wrote: How are we suppose to support configuration with multiple p2p interfaces pointing to the same destination address? Right now only one route to host is added. Diff below replaces a hack that move a host route from one p2p interface to another by always installing MPATH routes. It assumes that multiples p2p interfaces configured to the same destination address will have different local addresses. ok? After discussing it with Martin it sounds like a good idea. There's some semi-unrelated cleanup in in_ifinit that he has promised to commit separately. OK mikeb for both changes.
Re: Kill in_iawithaddr()
On 4 November 2014 12:52, Martin Pieuchot mpieuc...@nolizard.org wrote: This function is just a wrapper around ifa_ifwithaddr() and I'd prefer to have less function iterating over the global list of interfaces. ok? what's not immediately apparent is that it also makes sure that the address that ifa_ifwithaddr returns is not a broadcast address. oh god. ok for the cleanup
Re: network pool names
On 4 November 2014 13:23, Martin Pieuchot mpieuc...@nolizard.org wrote: Remove pl suffix, ok? ok with a syncache instead of syn
Re: pool page colouring
On 5 November 2014 01:12, Mike Belopuhov m...@belopuhov.com wrote: well, first of all, right now this is a rather theoretical gain. we need to test it to understand if it makes things easier. err. i meant to say go faster not easier.
Re: pool page colouring
On 5 November 2014 00:38, David Gwynne da...@gwynne.id.au wrote: On 30 Oct 2014, at 07:52, Ted Unangst t...@tedunangst.com wrote: On Wed, Oct 29, 2014 at 07:25, David Gwynne wrote: i dunno. im fine with either removing colouring altogether or setting it from something else completely. i just want a decision to be made cos right now ph_color isnt set, which is a bug. there. i fixed it. looks like we were both ignorant and wrong. mikeb@ points out this from the original slab paper: 4.1. Impact of Buffer Address Distribution on Cache Utilization The address distribution of mid-size buffers can affect the system’s overall cache utilization. In par- ticular, power-of-two allocators - where all buffers are 2 n bytes and are 2 n -byte aligned - are pes- simal.* Suppose, for example, that every inode (∼ 300 bytes) is assigned a 512-byte buffer, 512-byte aligned, and that only the first dozen fields of an inode (48 bytes) are frequently referenced. Then the majority of inode-related memory traffic will be at addresses between 0 and 47 modulo 512. Thus the cache lines near 512-byte boundaries will be heavily loaded while the rest lie fallow. In effect only 9% (48/512) of the cache will be usable by inodes. Fully-associative caches would not suffer this problem, but current hardware trends are toward simpler rather than more complex caches. 4.3. Slab Coloring The slab allocator incorporates a simple coloring scheme that distributes buffers evenly throughout the cache, resulting in excellent cache utilization and bus balance. The concept is simple: each time a new slab is created, the buffer addresses start at a slightly different offset (color) from the slab base (which is always page-aligned). For example, for a cache of 200-byte objects with 8-byte alignment, the first slab’s buffers would be at addresses 0, 200, 400, ... relative to the slab base. The next slab’s buffers would be at offsets 8, 208, 408, ... and so on. The maximum slab color is determined by the amount of unused space in the slab. we run on enough different machines that i think we should consider this. well, first of all, right now this is a rather theoretical gain. we need to test it to understand if it makes things easier. to see cache statistics we can use performance counters, however current pctr code might be a bit out of date. so the question is if we do bring colouring back, how do we calculate it? arc4random? mask bits off ph_magic? atomic_inc something in the pool? read a counter from the pool? shift bits off the page address? the way i read it is that you have a per-pool running value pr_color that you increment by the item alignment or native cache line size modulo space available for every page you are getting from uvm. however i can see that it might entail a problem by locating a page header (or was it page boundary? don't have the code at hand) using simple math.
Re: pool page colouring
On 29 October 2014 22:52, Ted Unangst t...@tedunangst.com wrote: On Wed, Oct 29, 2014 at 07:25, David Gwynne wrote: i dunno. im fine with either removing colouring altogether or setting it from something else completely. i just want a decision to be made cos right now ph_color isnt set, which is a bug. there. i fixed it. so is there any performance difference?
Re: pool page colouring
On 28 October 2014 17:02, Ted Unangst t...@tedunangst.com wrote: On Tue, Oct 28, 2014 at 16:49, David Gwynne wrote: when i shuffled the locking in pools around, page colouring was left behind. page colouring is where you offset items within a page if you have enough slack space. the previous implementation simply incremented the colour so each new page got the next offset. i didnt do this because the page and its items are now initted outside the lock, so maintaining that curcolour iterator wasnt as easy. this sidesteps the curcolor maintenance by just having each page randomly pick a colour when it's set up. Would it make more sense to use the page address to pick the color? Does it actually still make sense to keep page coloring? Is there still benefit on modern hardware?
Re: A system without interface?
On 14 October 2014 11:01, Martin Pieuchot mpieuc...@nolizard.org wrote: On 08/10/14(Wed) 14:29, Martin Pieuchot wrote: I'm looking after the uses of the global list of interface. These ones are pointless, you always have at least one interface on your system. Ok? Anyone? looks good to me. ok mikeb
Re: RTFREE - rtfree
On 8 October 2014 12:24, Martin Pieuchot mpieuc...@nolizard.org wrote: Diff below kills the macro and use the fonction instead since they are equivalent. It also replaces some 0 - NULL where it applies. It does not include the manpage bits, I'll deal with that afterward. I'd appreciate reviews and oks. although syn_cache_put doesn't really require nullification, i'm ok with the diff.
Re: improving OpenBSD's gmac.c...
On 8 October 2014 00:48, John-Mark Gurney j...@funkthat.com wrote: Christian Weisgerber wrote this message on Tue, Oct 07, 2014 at 23:08 +0200: John-Mark Gurney: So, as I was working on FreeBSD's implementation of gmac.c, I noticed that I was able to get a significant speed up by using a mask instead of an if branch in ghash_gfmul in gmac.c from OpenBSD... Add a mask var and replace the code between the comments update Z and update V w/: mask = !!(x[i 3] (1 (~i 7))); mask = ~(mask - 1); z[0] ^= v[0] mask; z[1] ^= v[1] mask; z[2] ^= v[2] mask; z[3] ^= v[3] mask; And you should see a nice performance increase... I tried this on a Soekris net6501-50 and the performance increase was around 1.3%. (I set up an ESP transport association with AES-128-GMAC and pushed UDP traffic with tcpbench over it.) Yeh, I benchmarked the raw algo in userland, not part of IPsec. I forget the resulting perf increase, but it was well more than 10-20%. A look at the generated amd64 assembly code shows that the change indeed removes a branch. What's pretty shocking is that this code mul = v[3] 1; ... v[0] = (v[0] 1) ^ (0xe100 * mul); is turned into an actual imul instruction by GCC. I used the same masking approach to get rid of the multiplication, but the improvement was minuscule (1%). Hmm. interesting... In my code I switched both to using the and operator... I also have code which switches the registers to 64bits so that on amd64, we make better uses of registers, and on i386, the compilers are good at breaking down the 64bit registers to 32bit w/o extra work... I also have an implementation of ghash that does a 4 bit lookup table version with the table split between cache lines in p4 at: https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.cREV=4 I'll have to look at this, but haven't there been increasing misgivings about table implementations for GHASH because of timing attacks? Well, the code avoids one issue but introduces another issue... Each table lookup accesses all four lines (assuming 64 byte cache lines), so there isn't a problem there... The issue intrroduded is that the first 64 bits of a cache line are faster to access than the remaining bits... For more info, look at the NSS bug: https://bugzilla.mozilla.org/show_bug.cgi?id=868948 Though considering that the AES implementation that FreeBSD is still using is the standard table based AES, there are also timing issues there too... Yes, no excuse for opening up additional windows... I have thought about introducing an option of slow but secure and fast but insecure against local attackers... Since we are already on the fast but insecure against local attackers path, I figured I'll take the extra performance... As with all things, there is a trade off between speed and security... As most IPsec gateways don't have a local user trying to target the key, this is less of an issue... And even if they did, it'd probably be easier to use a local root exploit to get the key than try to mount a timing attack to extract a key that might change in an hour or a day... -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. Hi, I've talked to Theo and it looks like we'll be importing your GF2 multiplication library as is. I think we should concentrate on making table version of gmac.c work better. Cheers, Mike
Re: splnet() and SIOCSIFADDR
On 3 September 2014 15:53, Martin Pieuchot mpieuc...@nolizard.org wrote: On 03/09/14(Wed) 15:25, Martin Pieuchot wrote: Drivers that need a splnet() protection inside their SIOCSIFADDR generally raise the spl level themselves, so we should not need to do that in in{6,}_ifinit(). One exception to this rule is, as always, carp(4)... So the diff below moves the spl dance inside carp's SIOCSIFADDR handler, it's a baby step, so we can take care of carp_set_addr{6,} later. mikeb@ pointed out that a spl dance was missing from the SIOCDIFADDR_IN6 ioctl, updated diff below also fixes that. i think we're ready to step on a carefully placed landmine. OK mikeb
Re: bge(4) Jumbo support for newer chipsets
On 2 September 2014 03:54, Brad Smith b...@comstyle.com wrote: On Wed, Aug 27, 2014 at 02:25:27AM -0400, Brad Smith wrote: Looking for some testing of the following diff to add Jumbo support for the BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766 chipsets. Here is an updated diff with bge_rxrinfo() being fixed. get it in. OK mikeb
Re: minphys woes
On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Mike Belopuhov wrote: correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys. Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change.
Re: reduce the number of missed PCB cache with tcpbench -su
On 29 August 2014 18:01, Damien Miller d...@mindrot.org wrote: What's the benefit of this? This creates a UDP PCB per connection. Otherwise we always rely on matching the wildcard PCB. I've never seen an application do this; I doubt that. However, things like NTP or DNS servers usually expect requests from everyone so they don't do it. Now I have actually identified one use case that this diff breaks: multiple senders won't work. Which is kind of annoying. wouldn't it be better to improve the PCB cache so it caught this case (which seems the usual way UDP applications behave) instead? Definitely! Using something like a prefix tree (mickey advocates hash array mapped tries) would eliminate this second lookup and also solve problems with attacking PCB hash with collisions and resizing the hash table. -d
Re: minphys woes
On 1 September 2014 13:06, Stefan Fritsch s...@sfritsch.de wrote: On Mon, 1 Sep 2014, Mike Belopuhov wrote: On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. That's possible. But what is really strange is, why does openbsd then have an infrastructure to set different max transfer sizes for physio on a per-adapter basis? This makes no sense. Either the drivers have to support 64k transfers, in which case most of the minphys infrastructure is useless, or they don't have to. In the latter case the minphys infrastructure would need to be used in all code paths. i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change. You lost me there. Why should this depend on how many requests some filesystem makes with larger blocks? If there is the possibility that filesystems do such requests, someone needs to do the splitting of the requests. The question is who. The driver or the block layer. well, the filesystem doesn't issue requests for more than 64k at a time. newfs won't allow you to build a 128k block file filesystem. BTW, for my use case I don't actually want to limit the block size, but rather the number of DMA segments. But the most reasonable way to do that seemed to set minphys to max_segments * pagesize. If we change how these things work, we could take the number of DMA segments into account. can't help you with this one.
Re: reduce the number of missed PCB cache with tcpbench -su
Daniel, don't reply anything to Damien just yet. Can you please run a simple test on Monday. Try tcpbench -u -n 2 ip (as in multi- connection test) without your patch and then with the patch and see if behavior is changed. Thanks On 29 August 2014 18:01, Damien Miller d...@mindrot.org wrote: On Fri, 29 Aug 2014, Daniel Jakots wrote: Hi, When running tcpbench -su, a lot of them are counted as missed PCB cache. ... + n = recvfrom(fd, ptb-dummybuf, ptb-dummybuf_len, 0, + (struct sockaddr *)ss, slen); + if (n 0 connect(fd, (const struct sockaddr *)ss, slen)) + warn(fail to connect); What's the benefit of this? I've never seen an application do this; wouldn't it be better to improve the PCB cache so it caught this case (which seems the usual way UDP applications behave) instead? -d
Re: minphys woes
On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Miod Vallat wrote: sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS. Back to your original problem, you might want to print the sc_link struct as well the scsi_adapter struct it points to, when you detect a transfer larger than MAXPHYS. It has likely been overriden or reset to NULL by mistake at some point. OK, I will try that. Will take a bit until I have time, though. But I have also read the code and I could not find a place in the path from bread() to the scsi adapter cmd function where the minphys function is called. correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys.
Re: newfs.8
On 29 August 2014 08:19, Jason McIntyre j...@kerhand.co.uk wrote: is this correct? i'm not a user myself, but the text states that special, for mount_mfs, is typically that of the primary swap area. how would you even define the swap area without a disklabel? jmc sort of yes. mount_mfs(8) says this: [...] The special file is only used to read the disk label which provides a set of configuration parameters for the memory based file system. The special file is typically that of the primary swap area, since that is where the file system will be backed up when free memory gets low and the memory supporting the file system has to be paged. If the keyword ``swap'' is used instead of a special file name, default configuration parameters will be used. (This option is useful when trying to use mount_mfs on a machine without any disks.) in reality it fakes up a disklabel and proceeds. number of XXX's around this code is also mildly amusing...
Re: newfs.8
On 29 August 2014 13:44, Jason McIntyre j...@kerhand.co.uk wrote: On Fri, Aug 29, 2014 at 01:39:57PM +0200, Mike Belopuhov wrote: On 29 August 2014 08:19, Jason McIntyre j...@kerhand.co.uk wrote: is this correct? i'm not a user myself, but the text states that special, for mount_mfs, is typically that of the primary swap area. how would you even define the swap area without a disklabel? jmc sort of yes. mount_mfs(8) says this: [...] The special file is only used to read the disk label which provides a set of configuration parameters for the memory based file system. The special file is typically that of the primary swap area, since that is where the file system will be backed up when free memory gets low and the memory supporting the file system has to be paged. If the keyword ``swap'' is used instead of a special file name, default configuration parameters will be used. (This option is useful when trying to use mount_mfs on a machine without any disks.) in reality it fakes up a disklabel and proceeds. number of XXX's around this code is also mildly amusing... so the diff posted was correct and should be committed? jmc yes, i believe so. OK mikeb
Re: bge(4) Jumbo support for newer chipsets
On 28 August 2014 12:32, David Gwynne da...@gwynne.id.au wrote: On 28 Aug 2014, at 3:02 am, Mike Belopuhov m...@belopuhov.com wrote: On 27 August 2014 08:25, Brad Smith b...@comstyle.com wrote: Looking for some testing of the following diff to add Jumbo support for the BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766 chipsets. i have tested this on Broadcom BCM5719 rev 0x01, unknown BCM5719 (0x5719001), APE firmware NCSI 1.1.15.0 and Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003). it works, however i'm not strictly a fan of switching the cluster pool to larger one for 5714. wasting another 8k page (on sparc for example) for every rx cluster in 90% cases sounds kinda wrong to me. but ymmv. this is what MCLGETI was invented to solve though. comparing pre mclgeti to what this does: that doesn't make my point invalid though. a 5714 right now without jumbos would have 512 rings entries with 2048 bytes on each. 2048 * 512 is a 1024k of ram. if we bumped the std ring up to jumbos by default, 9216 * 512 would eat 4608k of ram. your calculation is a bit off. it's not 9216 * 512, in case of sparc64 it's 8k * 2 * 512 which is 8M. but my concern is different: you ask uvm to do more work for every cluster since now you need 2 consequent pages of memory for one cluster instead of just one that fits 8k/2k = 4 clusters. my boxes with bge with mclgeti generally sit around 40 clusters, but sometimes end up around 80. 80 * 9216 is 720k. we can have jumbos and still be ahead. if you compare the nics with split rings: 512 * 2048 + 256 * 9216 is ~3.3M. the same chip with mclgeti and only doing a 1500 byte workload would be 80 * 2048 + 17 * 9216, or 300k. apart from that there's a deficiency in the diff itself. you probably want to change MCLBYTES in bge_rxrinfo to bge_rx_std_len otherwise statistics look wrong. yeah. i have tested both 1500 and 9000 mtus on a 5714 and it is working well. as you say, 5719 seems to be fine too, but ive only tested it with mtu 1500. ill test 9k tomorrow. it needs tests on older chips too though.
Re: let vlan(4) mtu be limited by the parents hardmtu instead of current mtu
On 27 August 2014 13:17, David Gwynne da...@gwynne.id.au wrote: On Tue, Aug 26, 2014 at 09:11:14PM -0400, Brad Smith wrote: On 20/08/14 8:03 PM, David Gwynne wrote: sthen@ says this is likely a bit optimistic. while most of our drivers unconditionally configure their max mru, there's some stupid ones that still interpret the configured mtu as a what the mru should be. dlg oce(4) and ix(4) need to be fixed. this might fix ix(4). anyone able to test? ix and oce (internally though) calculate various paramethers related to flow control, internal buffer sizes and so on based on the maximum frame size. therefore if i set max frame size to 16k but always use 1,5k all these calculations will be off. and by killing SIOCSIFMTU you make it impossible for the driver to readjust them. we have no idea at the moment how will this affect nic's behavior. pieces of such black magic such as the code under /* Hardware Packet Buffer Flow Control setup */ comment must be studied before making such changes. and as i've mentioned to dlg, ix diff also switches it to using 16k cluster pool no matter what you actually need. this can be changed, but needs testing (i'm doing it now).
Re: let vlan(4) mtu be limited by the parents hardmtu instead of current mtu
On 27 August 2014 13:23, David Gwynne da...@gwynne.id.au wrote: On Tue, Aug 26, 2014 at 09:11:14PM -0400, Brad Smith wrote: On 20/08/14 8:03 PM, David Gwynne wrote: sthen@ says this is likely a bit optimistic. while most of our drivers unconditionally configure their max mru, there's some stupid ones that still interpret the configured mtu as a what the mru should be. dlg oce(4) and ix(4) need to be fixed. this might fix oce. can anyone test? works fine here. i think we can go with this one. OK mikeb
Re: bge(4) Jumbo support for newer chipsets
On 27 August 2014 08:25, Brad Smith b...@comstyle.com wrote: Looking for some testing of the following diff to add Jumbo support for the BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766 chipsets. i have tested this on Broadcom BCM5719 rev 0x01, unknown BCM5719 (0x5719001), APE firmware NCSI 1.1.15.0 and Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003). it works, however i'm not strictly a fan of switching the cluster pool to larger one for 5714. wasting another 8k page (on sparc for example) for every rx cluster in 90% cases sounds kinda wrong to me. but ymmv. apart from that there's a deficiency in the diff itself. you probably want to change MCLBYTES in bge_rxrinfo to bge_rx_std_len otherwise statistics look wrong. i'm certainly OK with !5714 part of the diff.
Re: pf: once for match rules?
On Tue, Aug 12, 2014 at 18:26 +0200, Mike Belopuhov wrote: On Tue, Jul 22, 2014 at 19:03 +0200, Mike Belopuhov wrote: Hi, Before I send a diff for pfctl to disable once on match rules, I've decided to try and see how much work is it to make it actually work. Turns out that I need to extend pf_rule_item by 3 pointers to track the match rule ruleset, anchor rule and the ruleset it belongs to. Here's what this means in practice. Consider a ruleset: block drop all match out log proto tcp to port 22 once anchor foo all { match out log proto tcp to port 22 once anchor bar all { match out log proto tcp to port 22 once pass out quick proto tcp to port 22 once } } Once we send a packet to port 22 the ruleset collapses to just: block drop all Thoughts? Henning thinks it's a bit of an overkill. Any other opinions? here we go then. OK? diff --git sbin/pfctl/parse.y sbin/pfctl/parse.y index c277b8d..61c2646 100644 --- sbin/pfctl/parse.y +++ sbin/pfctl/parse.y @@ -1488,12 +1488,18 @@ pfrule : action dir logquick interface af proto fromto if ($8.marker FOM_SETPRIO) { r.set_prio[0] = $8.set_prio[0]; r.set_prio[1] = $8.set_prio[1]; r.scrub_flags |= PFSTATE_SETPRIO; } - if ($8.marker FOM_ONCE) + if ($8.marker FOM_ONCE) { + if (r.action == PF_MATCH) { + yyerror(can't specify once for + match rules); + YYERROR; + } r.rule_flag |= PFRULE_ONCE; + } if ($8.marker FOM_AFTO) r.rule_flag |= PFRULE_AFTO; r.af = $5; if ($8.tag)
/dev/crypto removal (1/3): unlink pseudo device
first step is to unlink crypto(4) pseudo device from the architecture dependant character device tables and kernel config files. OK? diff --git sys/arch/alpha/alpha/conf.c sys/arch/alpha/alpha/conf.c index 83cea34..7d103af 100644 --- sys/arch/alpha/alpha/conf.c +++ sys/arch/alpha/alpha/conf.c @@ -190,11 +190,11 @@ struct cdevsw cdevsw[] = #endif cdev_bio_init(NBIO,bio),/* 53: ioctl tunnel */ cdev_notdef(), cdev_ptm_init(NPTY,ptm),/* 55: pseudo-tty ptm device */ cdev_hotplug_init(NHOTPLUG,hotplug), /* 56: devices hot plugging */ - cdev_crypto_init(NCRYPTO,crypto), /* 57: /dev/crypto */ + cdev_notdef(), /* 57: was: /dev/crypto */ cdev_bktr_init(NBKTR,bktr), /* 58: Bt848 video capture device */ cdev_radio_init(NRADIO,radio), /* 59: generic radio I/O */ cdev_mouse_init(NWSMUX, wsmux), /* 60: ws multiplexor */ cdev_vscsi_init(NVSCSI, vscsi), /* 61: vscsi */ cdev_notdef(), diff --git sys/arch/alpha/conf/GENERIC sys/arch/alpha/conf/GENERIC index 7b0c790..dbc8bdc 100644 --- sys/arch/alpha/conf/GENERIC +++ sys/arch/alpha/conf/GENERIC @@ -422,8 +422,7 @@ option ONEWIREVERBOSE owid* at onewire? # ID owsbm* at onewire? # Smart Battery Monitor owtemp* at onewire?# Temperature owctr* at onewire? # Counter device -pseudo-device crypto 1 pseudo-device hotplug 1 # devices hot plugging pseudo-device wsmux 2 # mouse keyboard multiplexor diff --git sys/arch/amd64/amd64/conf.c sys/arch/amd64/amd64/conf.c index 59d4c9b..5da5820 100644 --- sys/arch/amd64/amd64/conf.c +++ sys/arch/amd64/amd64/conf.c @@ -38,12 +38,10 @@ #include sys/tty.h #include sys/vnode.h #include machine/conf.h -#include inet.h - #include wd.h bdev_decl(wd); #include fdc.h #include fd.h bdev_decl(fd); @@ -254,11 +252,11 @@ struct cdevsw cdevsw[] = cdev_tty_init(NUCOM,ucom), /* 66: USB tty */ cdev_mouse_init(NWSKBD, wskbd), /* 67: keyboards */ cdev_mouse_init(NWSMOUSE, /* 68: mice */ wsmouse), cdev_mouse_init(NWSMUX, wsmux), /* 69: ws multiplexor */ - cdev_crypto_init(NCRYPTO,crypto), /* 70: /dev/crypto */ + cdev_notdef(), /* 70: was: /dev/crypto */ cdev_tty_init(NCZ,cztty), /* 71: Cyclades-Z serial port */ #ifdef USER_PCICONF cdev_pci_init(NPCI,pci),/* 72: PCI user */ #else cdev_notdef(), diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC index 09fd692..3177dd8 100644 --- sys/arch/amd64/conf/GENERIC +++ sys/arch/amd64/conf/GENERIC @@ -600,11 +600,10 @@ pseudo-device pctr1 pseudo-device nvram 1 pseudo-device hotplug 1 # devices hot plugging # mouse keyboard multiplexor pseudo-devices pseudo-device wsmux 2 -pseudo-device crypto 1 # Virtio devices virtio*at pci? # Virtio PCI device vioblk*at virtio? # Virtio block device vio* at virtio? # Virtio network device diff --git sys/arch/arm/arm/conf.c sys/arch/arm/arm/conf.c index d381885..8f12732 100644 --- sys/arch/arm/arm/conf.c +++ sys/arch/arm/arm/conf.c @@ -53,12 +53,10 @@ #include sys/conf.h #include sys/vnode.h #include machine/conf.h -#include inet.h - /* * From this point, these need to be MI foo.h files. */ /* @@ -321,11 +319,11 @@ struct cdevsw cdevsw[] = { cdev_lkm_dummy(), /* 42: reserved */ cdev_lkm_dummy(), /* 43: reserved */ cdev_lkm_dummy(), /* 44: reserved */ cdev_lkm_dummy(), /* 45: reserved */ cdev_pf_init(NPF,pf), /* 46: packet filter */ - cdev_crypto_init(NCRYPTO,crypto), /* 47: /dev/crypto */ + cdev_notdef(), /* 47: was: /dev/crypto */ cdev_lkm_dummy(), /* 48: reserved */ cdev_lkm_dummy(), /* 49: reserved */ cdev_systrace_init(NSYSTRACE,systrace), /* 50: system call tracing */ cdev_notdef(), /* 51: reserved */ cdev_bio_init(NBIO,bio),/* 52: ioctl tunnel */ diff --git sys/arch/armish/conf/GENERIC sys/arch/armish/conf/GENERIC index e72ff41..6bf908c 100644 --- sys/arch/armish/conf/GENERIC +++ sys/arch/armish/conf/GENERIC @@ -195,7 +195,6 @@ owsbm* at onewire? # Smart Battery Monitor owtemp* at onewire?# Temperature owctr* at onewire? # Counter device # mouse keyboard multiplexor pseudo-devices pseudo-device wsmux 2 -pseudo-device crypto 1 pseudo-device hotplug 1 # devices hot plugging diff --git sys/arch/hppa/hppa/conf.c sys/arch/hppa/hppa/conf.c
/dev/crypto removal (2/3): remove kernel support
this removes /dev/crypto device interface and public key functionality that is only usable via /dev/crypto. OK? diff --git sys/conf/files sys/conf/files index 3941639..9af78cc 100644 --- sys/conf/files +++ sys/conf/files @@ -870,11 +870,10 @@ file crypto/blf.c (inet ipsec) | crypto | vnd file crypto/cast.c (inet ipsec) | crypto file crypto/ecb_enc.c (inet ipsec) | crypto file crypto/set_key.c (inet ipsec) | crypto file crypto/ecb3_enc.c (inet ipsec) | crypto file crypto/crypto.c (inet ipsec) | crypto -file crypto/cryptodev.c((inet ipsec) | crypto) needs-flag file crypto/criov.c(inet ipsec) | crypto file crypto/cryptosoft.c (inet ipsec) | crypto file crypto/xform.c(inet ipsec) | crypto file crypto/xform_ipcomp.c (inet ipsec) | crypto file crypto/arc4.c diff --git sys/crypto/crypto.c sys/crypto/crypto.c index f62752d..4a11ae3 100644 --- sys/crypto/crypto.c +++ sys/crypto/crypto.c @@ -280,39 +280,10 @@ crypto_get_driverid(u_int8_t flags) /* * Register a crypto driver. It should be called once for each algorithm * supported by the driver. */ int -crypto_kregister(u_int32_t driverid, int *kalg, -int (*kprocess)(struct cryptkop *)) -{ - int s, i; - - if (driverid = crypto_drivers_num || kalg == NULL || - crypto_drivers == NULL) - return EINVAL; - - s = splvm(); - - for (i = 0; i = CRK_ALGORITHM_MAX; i++) { - /* -* XXX Do some performance testing to determine -* placing. We probably need an auxiliary data -* structure that describes relative performances. -*/ - - crypto_drivers[driverid].cc_kalg[i] = kalg[i]; - } - - crypto_drivers[driverid].cc_kprocess = kprocess; - - splx(s); - return 0; -} - -/* Register a crypto driver. */ -int crypto_register(u_int32_t driverid, int *alg, int (*newses)(u_int32_t *, struct cryptoini *), int (*freeses)(u_int64_t), int (*process)(struct cryptop *)) { int s, i; @@ -410,71 +381,10 @@ crypto_dispatch(struct cryptop *crp) } return 0; } -int -crypto_kdispatch(struct cryptkop *krp) -{ - if (crypto_taskq) { - task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL); - task_add(crypto_taskq, krp-krp_task); - } else { - crypto_kinvoke(krp); - } - - return 0; -} - -/* - * Dispatch an asymmetric crypto request to the appropriate crypto devices. - */ -int -crypto_kinvoke(struct cryptkop *krp) -{ - extern int cryptodevallowsoft; - u_int32_t hid; - int error; - int s; - - /* Sanity checks. */ - if (krp == NULL || krp-krp_callback == NULL) - return (EINVAL); - - s = splvm(); - for (hid = 0; hid crypto_drivers_num; hid++) { - if ((crypto_drivers[hid].cc_flags CRYPTOCAP_F_SOFTWARE) - cryptodevallowsoft == 0) - continue; - if (crypto_drivers[hid].cc_kprocess == NULL) - continue; - if ((crypto_drivers[hid].cc_kalg[krp-krp_op] - CRYPTO_ALG_FLAG_SUPPORTED) == 0) - continue; - break; - } - - if (hid == crypto_drivers_num) { - krp-krp_status = ENODEV; - crypto_kdone(krp); - splx(s); - return (0); - } - - krp-krp_hid = hid; - - crypto_drivers[hid].cc_koperations++; - - error = crypto_drivers[hid].cc_kprocess(krp); - if (error) { - krp-krp_status = error; - crypto_kdone(krp); - } - splx(s); - return (0); -} - /* * Dispatch a crypto request to the appropriate crypto devices. */ int crypto_invoke(struct cryptop *crp) @@ -624,40 +534,5 @@ crypto_done(struct cryptop *crp) task_set(crp-crp_task, (void (*))crp-crp_callback, crp, NULL); task_add(crypto_taskq, crp-crp_task); } } - -/* - * Invoke the callback on behalf of the driver. - */ -void -crypto_kdone(struct cryptkop *krp) -{ - task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL); - task_add(crypto_taskq, krp-krp_task); -} - -int -crypto_getfeat(int *featp) -{ - extern int cryptodevallowsoft, userasymcrypto; - int hid, kalg, feat = 0; - - if (userasymcrypto == 0) - goto out; - for (hid = 0; hid crypto_drivers_num; hid++) { - if ((crypto_drivers[hid].cc_flags CRYPTOCAP_F_SOFTWARE) - cryptodevallowsoft == 0) { - continue; - } - if
/dev/crypto removal (3/3): userland bits
please note that the commented out example usage in etc/MAKEDEV.common remains till someone feels the need to change it. OK? diff --git etc/MAKEDEV.common etc/MAKEDEV.common index bfcd943..b656d46 100644 --- etc/MAKEDEV.common +++ etc/MAKEDEV.common @@ -131,11 +131,10 @@ target(all, wd, 0, 1, 2, 3)dnl target(all, xd, 0, 1, 2, 3)dnl target(all, systrace)dnl target(all, pctr)dnl target(all, pctr0)dnl target(all, pf)dnl -twrget(all, cry, crypto)dnl target(all, apm)dnl target(all, acpi)dnl twrget(all, tth, ttyh, 0, 1)dnl target(all, ttyA, 0, 1)dnl target(all, ttyB, 0, 1, 2, 3, 4, 5)dnl @@ -446,12 +445,10 @@ __devitem(fdesc, fd, fd/* nodes, fd)dnl _mkdev(fdesc, fd, {-RMlist[${#RMlist[*]}]=;mkdir -p fd;rm -f n=0 while [ $n -lt 64 ];do M fd/$n c major_fdesc_c $n;n=Add($n, 1);done MKlist[${#MKlist[*]}]=;chmod 555 fd-})dnl __devitem(oppr, openprom,PROM settings,openprom)dnl _cdev(oppr, openprom, 70, 0)dnl -__devitem(cry, crypto, Hardware crypto access driver,crypto)dnl -_mkdev(cry, crypto, {-M crypto c major_cry_c-} 0)dnl __devitem(pf, pf*, Packet Filter)dnl _mkdev(pf, {-pf*-}, {-M pf c major_pf_c 0 600-})dnl __devitem(bpf, bpf*, Berkeley Packet Filter)dnl _mkdev(bpf, {-bpf*-}, {-M bpf$U c major_bpf_c $U 600-}, 600)dnl _mkdev(tun, {-tun*-}, {-M tun$U c major_tun_c $U 600-}, 600)dnl diff --git etc/etc.alpha/MAKEDEV etc/etc.alpha/MAKEDEV index 2dd9859..050aa4e 100644 --- etc/etc.alpha/MAKEDEV +++ etc/etc.alpha/MAKEDEV @@ -68,11 +68,10 @@ # Special purpose devices: # audio* Audio devices # bio ioctl tunnel pseudo-device # bktr* Video frame grabbers # bpf*Berkeley Packet Filter -# crypto Hardware crypto access driver # diskmap Disk mapper # fd fd/* nodes # fuseUserland Filesystem # hotplug devices hot plugging # lkm Loadable kernel modules interface @@ -324,14 +323,10 @@ fd) diskmap) M diskmap c 63 0 640 operator ;; -crypto) - M crypto c 57 0 - ;; - bpf*) M bpf$U c 11 $U 600 ;; bktr*) @@ -539,11 +534,11 @@ all) R local wscons pci0 pci1 pci2 pci3 uall rmidi0 rmidi1 rmidi2 R rmidi3 rmidi4 rmidi5 rmidi6 rmidi7 tuner0 radio0 speaker R video0 video1 uk0 random lpa0 lpa1 lpa2 lpt0 lpt1 lpt2 lkm R tty00 tty01 tty02 tty03 tty04 tty05 tty06 tty07 tty08 tty09 R tty0a tty0b ttyc0 ttyc1 ttyc2 ttyc3 ttyc4 ttyc5 ttyc6 ttyc7 - R ttyB0 ttyB1 ttyB2 ttyB3 ttyB4 ttyB5 crypto pf systrace wd0 + R ttyB0 ttyB1 ttyB2 ttyB3 ttyB4 ttyB5 pf systrace wd0 R wd1 wd2 wd3 std st0 st1 fd ;; wd*|sd*) case $i in diff --git etc/etc.amd64/MAKEDEV etc/etc.amd64/MAKEDEV index 2971021..4a0d05c 100644 --- etc/etc.amd64/MAKEDEV +++ etc/etc.amd64/MAKEDEV @@ -68,11 +68,10 @@ # apm Power Management Interface # audio* Audio devices # bio ioctl tunnel pseudo-device # bktr* Video frame grabbers # bpf*Berkeley Packet Filter -# crypto Hardware crypto access driver # diskmap Disk mapper # drm*Direct Rendering Manager # fd fd/* nodes # fuseUserland Filesystem # gpio* General Purpose Input/Output @@ -344,14 +343,10 @@ drm*) diskmap) M diskmap c 90 0 640 operator ;; -crypto) - M crypto c 70 0 - ;; - bpf*) M bpf$U c 23 $U 600 ;; bktr*) @@ -565,11 +560,11 @@ all) R wscons pci0 pci1 pci2 pci3 uall rmidi0 rmidi1 rmidi2 rmidi3 R rmidi4 rmidi5 rmidi6 rmidi7 tuner0 radio0 speaker video0 R video1 uk0 random lpa0 lpa1 lpa2 lpt0 lpt1 lpt2 lkm tty00 R tty01 tty02 tty03 tty04 tty05 tty06 tty07 tty08 tty09 tty0a R tty0b ttyc0 ttyc1 ttyc2 ttyc3 ttyc4 ttyc5 ttyc6 ttyc7 apm - R crypto pf pctr systrace wd0 wd1 wd2 wd3 std st0 st1 fd + R pf pctr systrace wd0 wd1 wd2 wd3 std st0 st1 fd ;; wd*|sd*) case $i in wd*) dodisk wd $U 0 3 $U 0;; diff --git etc/etc.armish/MAKEDEV etc/etc.armish/MAKEDEV index 7a3a21e..25fd4d4 100644 --- etc/etc.armish/MAKEDEV +++ etc/etc.armish/MAKEDEV @@ -62,11 +62,10 @@ # apm Power management device # audio* Audio devices # bio ioctl tunnel pseudo-device # bktr* Video frame grabbers # bpf*Berkeley Packet Filter -# crypto Hardware crypto access driver # diskmap Disk mapper # fd fd/* nodes # fuseUserland Filesystem # hotplug devices hot plugging # lkm Loadable kernel modules interface @@ -302,14 +301,10 @@ fd) diskmap) M diskmap c 102 0 640 operator ;; -crypto) - M crypto c 47 0 - ;; - bpf*) M bpf$U c 22 $U 600 ;; bktr*) @@ -481,11 +476,11 @@ all) R bpf5 bpf6 bpf7 bpf8 bpf9 pty0 diskmap vscsi0 ch0 audio0 R audio1 audio2 fuse pppx hotplug ptm local wscons pci0 pci1 R pci2 pci3 uall rmidi0 rmidi1 rmidi2 rmidi3 rmidi4 rmidi5
Re: /dev/crypto removal (2/3): remove kernel support
On Mon, Aug 18, 2014 at 16:03 +0200, Mike Belopuhov wrote: this removes /dev/crypto device interface and public key functionality that is only usable via /dev/crypto. OK? minor correction: preserve #ifdef _KERNEL in the cryptodev.h. while there are no userland programs including it atm leaving it for the symmetry with cryptosoft.c seems like a good idea. diff --git sys/conf/files sys/conf/files index 3941639..9af78cc 100644 --- sys/conf/files +++ sys/conf/files @@ -870,11 +870,10 @@ file crypto/blf.c (inet ipsec) | crypto | vnd file crypto/cast.c (inet ipsec) | crypto file crypto/ecb_enc.c (inet ipsec) | crypto file crypto/set_key.c (inet ipsec) | crypto file crypto/ecb3_enc.c (inet ipsec) | crypto file crypto/crypto.c (inet ipsec) | crypto -file crypto/cryptodev.c((inet ipsec) | crypto) needs-flag file crypto/criov.c(inet ipsec) | crypto file crypto/cryptosoft.c (inet ipsec) | crypto file crypto/xform.c(inet ipsec) | crypto file crypto/xform_ipcomp.c (inet ipsec) | crypto file crypto/arc4.c diff --git sys/crypto/crypto.c sys/crypto/crypto.c index f62752d..4a11ae3 100644 --- sys/crypto/crypto.c +++ sys/crypto/crypto.c @@ -280,39 +280,10 @@ crypto_get_driverid(u_int8_t flags) /* * Register a crypto driver. It should be called once for each algorithm * supported by the driver. */ int -crypto_kregister(u_int32_t driverid, int *kalg, -int (*kprocess)(struct cryptkop *)) -{ - int s, i; - - if (driverid = crypto_drivers_num || kalg == NULL || - crypto_drivers == NULL) - return EINVAL; - - s = splvm(); - - for (i = 0; i = CRK_ALGORITHM_MAX; i++) { - /* -* XXX Do some performance testing to determine -* placing. We probably need an auxiliary data -* structure that describes relative performances. -*/ - - crypto_drivers[driverid].cc_kalg[i] = kalg[i]; - } - - crypto_drivers[driverid].cc_kprocess = kprocess; - - splx(s); - return 0; -} - -/* Register a crypto driver. */ -int crypto_register(u_int32_t driverid, int *alg, int (*newses)(u_int32_t *, struct cryptoini *), int (*freeses)(u_int64_t), int (*process)(struct cryptop *)) { int s, i; @@ -410,71 +381,10 @@ crypto_dispatch(struct cryptop *crp) } return 0; } -int -crypto_kdispatch(struct cryptkop *krp) -{ - if (crypto_taskq) { - task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL); - task_add(crypto_taskq, krp-krp_task); - } else { - crypto_kinvoke(krp); - } - - return 0; -} - -/* - * Dispatch an asymmetric crypto request to the appropriate crypto devices. - */ -int -crypto_kinvoke(struct cryptkop *krp) -{ - extern int cryptodevallowsoft; - u_int32_t hid; - int error; - int s; - - /* Sanity checks. */ - if (krp == NULL || krp-krp_callback == NULL) - return (EINVAL); - - s = splvm(); - for (hid = 0; hid crypto_drivers_num; hid++) { - if ((crypto_drivers[hid].cc_flags CRYPTOCAP_F_SOFTWARE) - cryptodevallowsoft == 0) - continue; - if (crypto_drivers[hid].cc_kprocess == NULL) - continue; - if ((crypto_drivers[hid].cc_kalg[krp-krp_op] - CRYPTO_ALG_FLAG_SUPPORTED) == 0) - continue; - break; - } - - if (hid == crypto_drivers_num) { - krp-krp_status = ENODEV; - crypto_kdone(krp); - splx(s); - return (0); - } - - krp-krp_hid = hid; - - crypto_drivers[hid].cc_koperations++; - - error = crypto_drivers[hid].cc_kprocess(krp); - if (error) { - krp-krp_status = error; - crypto_kdone(krp); - } - splx(s); - return (0); -} - /* * Dispatch a crypto request to the appropriate crypto devices. */ int crypto_invoke(struct cryptop *crp) @@ -624,40 +534,5 @@ crypto_done(struct cryptop *crp) task_set(crp-crp_task, (void (*))crp-crp_callback, crp, NULL); task_add(crypto_taskq, crp-crp_task); } } - -/* - * Invoke the callback on behalf of the driver. - */ -void -crypto_kdone(struct cryptkop *krp) -{ - task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL); - task_add(crypto_taskq, krp-krp_task); -} - -int -crypto_getfeat(int *featp) -{ - extern int cryptodevallowsoft, userasymcrypto; - int hid, kalg, feat = 0; - - if (userasymcrypto == 0) - goto out; - for (hid
/dev/crypto removal (3bis): DTYPE_CRYPTO
I don't know if we recycle them somehow, but just in case... diff --git sys/sys/file.h sys/sys/file.h index d98118e..64c0f31 100644 --- sys/sys/file.h +++ sys/sys/file.h @@ -67,11 +67,11 @@ struct file { short f_flag; /* see fcntl.h */ #defineDTYPE_VNODE 1 /* file */ #defineDTYPE_SOCKET2 /* communications endpoint */ #defineDTYPE_PIPE 3 /* pipe */ #defineDTYPE_KQUEUE4 /* event queue */ -#defineDTYPE_CRYPTO5 /* crypto */ +/* was define DTYPE_CRYPTO5 */ #defineDTYPE_SYSTRACE 6 /* system call tracing */ short f_type; /* descriptor type */ longf_count;/* reference count */ longf_msgcount; /* references from message queue */ struct ucred *f_cred; /* credentials associated with descriptor */
Re: Kill MRT_{ADD,DEL}_BW_UPCALL
On 13 August 2014 10:56, Martin Pieuchot mpieuc...@nolizard.org wrote: Our multicast routing code is insert your adjective and for the most part unused. We discussed with claudio@ during t2k13 to rewrite only the parts that people currently use, any volunteer? In the meantime, I'd like to kill the obviously unused parts of it. So here's a first diff that remove the bandwidth monitoring interface. Nothing use it in base and a quick search on codesearch.debian.net reveals that only net/xorp picks it if it finds the defines. Ok? OK
Re: Fix pppoe(4) with rdomain != 0
OK On 13 August 2014 11:56, Martin Pieuchot mpieuc...@nolizard.org wrote: ok? Index: net/if_pppoe.c === RCS file: /home/ncvs/src/sys/net/if_pppoe.c,v retrieving revision 1.40 diff -u -p -r1.40 if_pppoe.c --- net/if_pppoe.c 12 Jul 2014 18:44:22 - 1.40 +++ net/if_pppoe.c 13 Aug 2014 09:56:16 - @@ -1398,6 +1398,9 @@ pppoe_send_padt(struct ifnet *outgoing_i memcpy(eh-ether_dhost, dest, ETHER_ADDR_LEN); m0-m_flags = ~(M_BCAST|M_MCAST); + /* encapsulated packet is forced into rdomain of physical interface */ + m0-m_pkthdr.ph_rtableid = outgoing_if-if_rdomain; + return (outgoing_if-if_output(outgoing_if, m0, dst, NULL)); }
[regress] convert aes testcase from /dev/crypto
in order to deprecate crypto(4) interface regress tests need to be converted. this aes test case actually uses ecb vectors, therefore no chaining is required and the code looks very simple. OK? diff --git regress/sys/crypto/aes/Makefile regress/sys/crypto/aes/Makefile index 459aedb..826d98c 100644 --- regress/sys/crypto/aes/Makefile +++ regress/sys/crypto/aes/Makefile @@ -1,9 +1,13 @@ # $OpenBSD: Makefile,v 1.2 2014/01/18 05:54:52 martynas Exp $ -PROG= aestest +DIR= ${.CURDIR}/../../../../sys + +CFLAGS+= -I${DIR} +PROG= aestest +SRCS= aestest.c CDIAGFLAGS=-Wall #CDIAGFLAGS+= -Werror CDIAGFLAGS+= -Wpointer-arith CDIAGFLAGS+= -Wno-uninitialized CDIAGFLAGS+= -Wstrict-prototypes @@ -12,9 +16,12 @@ CDIAGFLAGS+= -Wunused CDIAGFLAGS+= -Wsign-compare CDIAGFLAGS+= -Wshadow REGRESS_ROOT_TARGETS= run-regress-${PROG} +.PATH: ${DIR}/crypto +SRCS+= rijndael.c + run-regress-${PROG}: ${PROG} - ${SUDO} ./${PROG} ${.CURDIR}/vectors/*.txt + ./${PROG} ${.CURDIR}/vectors/*.txt .include bsd.regress.mk diff --git regress/sys/crypto/aes/aestest.c regress/sys/crypto/aes/aestest.c index 2437c38..720dbc1 100644 --- regress/sys/crypto/aes/aestest.c +++ regress/sys/crypto/aes/aestest.c @@ -24,117 +24,39 @@ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /* - * Test crypto(4) AES with test vectors provided by Dr Brian Gladman: - * http://fp.gladman.plus.com/AES/ + * Test kernel AES implementation with test vectors provided by + * Dr Brian Gladman: http://fp.gladman.plus.com/AES/ */ -#include sys/types.h #include sys/param.h -#include sys/ioctl.h -#include sys/sysctl.h -#include crypto/cryptodev.h +#include crypto/rijndael.h #include err.h -#include fcntl.h #include stdio.h #include stdlib.h #include string.h #include unistd.h #include ctype.h static int -syscrypt(const unsigned char *key, size_t klen, const unsigned char *in, +docrypt(const unsigned char *key, size_t klen, const unsigned char *in, unsigned char *out, size_t len, int do_encrypt) { - struct session_op session; - struct crypt_op cryp; - int cryptodev_fd = -1, fd = -1; - u_char iv[32]; - - /* -* Kludge; the kernel doesn't support ECB encryption so we -* use a all-zero IV and encrypt a single block only, so the -* result should be the same. -*/ - bzero(iv, sizeof(iv)); - - if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0)) 0) { - warn(/dev/crypto); - goto err; - } - if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) { - warn(CRIOGET failed); - goto err; - } - memset(session, 0, sizeof(session)); - session.cipher = CRYPTO_AES_CBC; - session.key = (caddr_t) key; - session.keylen = klen; - if (ioctl(fd, CIOCGSESSION, session) == -1) { - warn(CIOCGSESSION); - goto err; - } - memset(cryp, 0, sizeof(cryp)); - cryp.ses = session.ses; - cryp.op = do_encrypt ? COP_ENCRYPT : COP_DECRYPT; - cryp.flags = 0; - cryp.len = len; - cryp.src = (caddr_t) in; - cryp.dst = (caddr_t) out; - cryp.iv = (caddr_t) iv; - cryp.mac = 0; - if (ioctl(fd, CIOCCRYPT, cryp) == -1) { - warn(CIOCCRYPT); - goto err; - } - if (ioctl(fd, CIOCFSESSION, session.ses) == -1) { - warn(CIOCFSESSION); - goto err; - } - close(fd); - close(cryptodev_fd); - return (0); - -err: - if (fd != -1) - close(fd); - if (cryptodev_fd != -1) - close(cryptodev_fd); - return (-1); -} - -static int -getallowsoft(void) -{ - int mib[2], old; - size_t olen; - - olen = sizeof(old); - - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - if (sysctl(mib, 2, old, olen, NULL, 0) 0) - err(1, sysctl failed); - - return old; -} - -static void -setallowsoft(int new) -{ - int mib[2], old; - size_t olen, nlen; - - olen = nlen = sizeof(new); - - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - - if (sysctl(mib, 2, old, olen, new, nlen) 0) - err(1, sysctl failed); + rijndael_ctx ctx; + int error = 0; + + memset(ctx, 0, sizeof(ctx)); + error = rijndael_set_key(ctx, key, klen * 8); + if (error) + return -1; + if (do_encrypt) + rijndael_encrypt(ctx, in, out); + else + rijndael_decrypt(ctx, in, out); + return 0; } static int match(unsigned char *a, unsigned char *b, size_t len) { @@ -221,21 +143,21 @@ do_tests(const char *filename, int test_num, u_char *key, u_int keylen, { char result[32]; int fail = 0; /* Encrypt test */
[regress] convert aes-ctr test from /dev/crypto
this test is converted the same way jsing@ has recently converted an xts test by pulling in xform.c code. OK? diff --git regress/sys/crypto/aesctr/Makefile regress/sys/crypto/aesctr/Makefile index 31ae500..7310dbc 100644 --- regress/sys/crypto/aesctr/Makefile +++ regress/sys/crypto/aesctr/Makefile @@ -1,10 +1,29 @@ # $OpenBSD: Makefile,v 1.1 2005/05/25 05:47:53 markus Exp $ +DIR= ${.CURDIR}/../../../../sys + +CFLAGS+= -I${DIR} + PROG= aesctr +SRCS= aesctr.c + +CDIAGFLAGS=-Wall +CDIAGFLAGS+= -Werror +CDIAGFLAGS+= -Wpointer-arith +CDIAGFLAGS+= -Wno-uninitialized +CDIAGFLAGS+= -Wstrict-prototypes +CDIAGFLAGS+= -Wmissing-prototypes +CDIAGFLAGS+= -Wunused +CDIAGFLAGS+= -Wsign-compare +CDIAGFLAGS+= -Wshadow REGRESS_ROOT_TARGETS= run-regress-${PROG} +.PATH: ${DIR}/crypto +SRCS+= cast.c ecb_enc.c ecb3_enc.c gmac.c rijndael.c set_key.c +SRCS+= xform.c + run-regress-${PROG}: ${PROG} - ${SUDO} ./${PROG} + ./${PROG} .include bsd.regress.mk diff --git regress/sys/crypto/aesctr/aesctr.c regress/sys/crypto/aesctr/aesctr.c index 4cc1a6e..3a0b4d1 100644 --- regress/sys/crypto/aesctr/aesctr.c +++ regress/sys/crypto/aesctr/aesctr.c @@ -14,17 +14,13 @@ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ -#include sys/types.h #include sys/param.h -#include sys/ioctl.h -#include sys/sysctl.h -#include crypto/cryptodev.h +#include crypto/rijndael.h #include err.h -#include fcntl.h #include stdio.h #include stdlib.h #include string.h #include unistd.h #include limits.h @@ -128,92 +124,67 @@ struct { B4 07 DF 86 65 69 FD 07 F4 8C C0 B5 83 D6 07 1F /*1E C0 E6 B8*/, }, }; -static int -syscrypt(const unsigned char *key, size_t klen, const unsigned char *iv, -const unsigned char *in, unsigned char *out, size_t len, int encrypt) -{ - struct session_op session; - struct crypt_op cryp; - int cryptodev_fd = -1, fd = -1; +/* Stubs */ - if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0)) 0) { - warn(/dev/crypto); - goto err; - } - if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) { - warn(CRIOGET failed); - goto err; - } - memset(session, 0, sizeof(session)); - session.cipher = CRYPTO_AES_CTR; - session.key = (caddr_t) key; - session.keylen = klen; - if (ioctl(fd, CIOCGSESSION, session) == -1) { - warn(CIOCGSESSION); - goto err; - } - memset(cryp, 0, sizeof(cryp)); - cryp.ses = session.ses; - cryp.op = encrypt ? COP_ENCRYPT : COP_DECRYPT; - cryp.flags = 0; - cryp.len = len; - cryp.src = (caddr_t) in; - cryp.dst = (caddr_t) out; - cryp.iv = (caddr_t) iv; - cryp.mac = 0; - if (ioctl(fd, CIOCCRYPT, cryp) == -1) { - warn(CIOCCRYPT); - goto err; - } - if (ioctl(fd, CIOCFSESSION, session.ses) == -1) { - warn(CIOCFSESSION); - goto err; - } - close(fd); - close(cryptodev_fd); - return (0); +u_int32_t deflate_global(u_int8_t *, u_int32_t, int, u_int8_t **); -err: - if (fd != -1) - close(fd); - if (cryptodev_fd != -1) - close(cryptodev_fd); - return (-1); +u_int32_t +deflate_global(u_int8_t *data, u_int32_t size, int comp, u_int8_t **out) +{ + return 0; } -static int -getallowsoft(void) +void explicit_bzero(void *, size_t); + +void +explicit_bzero(void *b, size_t len) { - int mib[2], old; - size_t olen; + bzero(b, len); +} - olen = sizeof(old); +/* Definitions from /sys/crypto/xform.c */ - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - if (sysctl(mib, 2, old, olen, NULL, 0) 0) - err(1, sysctl failed); +#define AESCTR_NONCESIZE 4 +#define AESCTR_IVSIZE 8 +#define AESCTR_BLOCKSIZE 16 - return old; -} +struct aes_ctr_ctx { + u_int32_t ac_ek[4*(AES_MAXROUNDS + 1)]; + u_int8_tac_block[AESCTR_BLOCKSIZE]; + int ac_nr; +}; -static void -setallowsoft(int new) -{ - int mib[2], old; - size_t olen, nlen; +int aes_ctr_setkey(void *, u_int8_t *, int); +void aes_ctr_encrypt(caddr_t, u_int8_t *); +void aes_ctr_decrypt(caddr_t, u_int8_t *); +void aes_ctr_reinit(caddr_t, u_int8_t *); - olen = nlen = sizeof(new); +static int +docrypt(const unsigned char *key, size_t klen, const unsigned char *iv, +const unsigned char *in, unsigned char *out, size_t len, int encrypt) +{ + u_int8_t block[AESCTR_BLOCKSIZE]; + struct aes_ctr_ctx ctx; + int error = 0; + size_t i; - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; +
[regress] convert enc (3des) test from /dev/crypto
this one with a bit of cheating however (manual cbc implementation). OK? diff --git regress/sys/crypto/enc/Makefile regress/sys/crypto/enc/Makefile index cc29b32..8725f0c 100644 --- regress/sys/crypto/enc/Makefile +++ regress/sys/crypto/enc/Makefile @@ -1,12 +1,21 @@ # $OpenBSD: Makefile,v 1.5 2010/10/15 10:39:12 jsg Exp $ +DIR= ${.CURDIR}/../../../../sys + +CFLAGS+= -I${DIR} + PROG= des3 +SRCS= des3.c LDADD=-lcrypto DPADD=${LIBCRYPTO} REGRESS_ROOT_TARGETS= run-regress-${PROG} +.PATH: ${DIR}/crypto +SRCS+= cast.c ecb_enc.c ecb3_enc.c gmac.c rijndael.c set_key.c +SRCS+= xform.c + run-regress-${PROG}: ${PROG} - ${SUDO} ./${PROG} + ./${PROG} .include bsd.regress.mk diff --git regress/sys/crypto/enc/des3.c regress/sys/crypto/enc/des3.c index 024418d..fe67872 100644 --- regress/sys/crypto/enc/des3.c +++ regress/sys/crypto/enc/des3.c @@ -22,105 +22,73 @@ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#include sys/types.h #include sys/param.h -#include sys/ioctl.h -#include sys/sysctl.h -#include crypto/cryptodev.h #include openssl/des.h #include err.h #include fcntl.h #include stdio.h #include stdlib.h #include string.h #include unistd.h -static int -syscrypt(const unsigned char *key, size_t klen, const unsigned char *iv, -const unsigned char *in, unsigned char *out, size_t len, int encrypt) -{ - struct session_op session; - struct crypt_op cryp; - int cryptodev_fd = -1, fd = -1; - - if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0)) 0) { - warn(/dev/crypto); - goto err; - } - if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) { - warn(CRIOGET failed); - goto err; - } - memset(session, 0, sizeof(session)); - session.cipher = CRYPTO_3DES_CBC; - session.key = (caddr_t) key; - session.keylen = klen; - if (ioctl(fd, CIOCGSESSION, session) == -1) { - warn(CIOCGSESSION); - goto err; - } - memset(cryp, 0, sizeof(cryp)); - cryp.ses = session.ses; - cryp.op = encrypt ? COP_ENCRYPT : COP_DECRYPT; - cryp.flags = 0; - cryp.len = len; - cryp.src = (caddr_t) in; - cryp.dst = (caddr_t) out; - cryp.iv = (caddr_t) iv; - cryp.mac = 0; - if (ioctl(fd, CIOCCRYPT, cryp) == -1) { - warn(CIOCCRYPT); - goto err; - } - if (ioctl(fd, CIOCFSESSION, session.ses) == -1) { - warn(CIOCFSESSION); - goto err; - } - close(fd); - close(cryptodev_fd); - return (0); +/* Stubs */ -err: - if (fd != -1) - close(fd); - if (cryptodev_fd != -1) - close(cryptodev_fd); - return (-1); -} +u_int32_t deflate_global(u_int8_t *, u_int32_t, int, u_int8_t **); -static int -getallowsoft(void) +u_int32_t +deflate_global(u_int8_t *data, u_int32_t size, int comp, u_int8_t **out) { - int mib[2], old; - size_t olen; - - olen = sizeof(old); - - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - if (sysctl(mib, 2, old, olen, NULL, 0) 0) - err(1, sysctl failed); - - return old; + return 0; } -static void -setallowsoft(int new) +void explicit_bzero(void *, size_t); + +void +explicit_bzero(void *b, size_t len) { - int mib[2], old; - size_t olen, nlen; + bzero(b, len); +} - olen = nlen = sizeof(new); - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; +/* Simulate CBC mode */ - if (sysctl(mib, 2, old, olen, new, nlen) 0) - err(1, sysctl failed); +static int +docrypt(const unsigned char *key, size_t klen, const unsigned char *iv0, +const unsigned char *in, unsigned char *out, size_t len, int encrypt) +{ + u_int8_t block[8], iv[8], iv2[8], *ivp = iv, *nivp; + u_int8_t ctx[384]; + int i, j, error = 0; + + memcpy(iv, iv0, 8); + memset(ctx, 0, sizeof(ctx)); + error = des3_setkey(ctx, key, klen); + if (error) + return -1; + for (i = 0; i len / 8; i ++) { + bcopy(in, block, 8); + in += 8; + if (encrypt) { + for (j = 0; j 8; j++) + block[j] ^= ivp[j]; + des3_encrypt(ctx, block); + memcpy(ivp, block, 8); + } else { + nivp = ivp == iv ? iv2 : iv; + memcpy(nivp, block, 8); + des3_decrypt(ctx, block); + for (j = 0; j 8; j++) + block[j] ^= ivp[j]; + ivp = nivp; + } +
Re: pf: once for match rules?
On Tue, Jul 22, 2014 at 19:03 +0200, Mike Belopuhov wrote: Hi, Before I send a diff for pfctl to disable once on match rules, I've decided to try and see how much work is it to make it actually work. Turns out that I need to extend pf_rule_item by 3 pointers to track the match rule ruleset, anchor rule and the ruleset it belongs to. Here's what this means in practice. Consider a ruleset: block drop all match out log proto tcp to port 22 once anchor foo all { match out log proto tcp to port 22 once anchor bar all { match out log proto tcp to port 22 once pass out quick proto tcp to port 22 once } } Once we send a packet to port 22 the ruleset collapses to just: block drop all Thoughts? Henning thinks it's a bit of an overkill. Any other opinions? diff --git sys/net/pf.c sys/net/pf.c index 9f0e2d6..5679a40 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -3279,15 +3279,16 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, PR_NOWAIT)) == NULL) { REASON_SET(reason, PFRES_MEMORY); goto cleanup; } ri-r = r; + ri-ar = a; + ri-rs = ruleset; + ri-ars = aruleset; /* order is irrelevant */ SLIST_INSERT_HEAD(rules, ri, entry); pf_rule_to_actions(r, act); - if (r-rule_flag PFRULE_AFTO) - pd-naf = r-naf; if (pf_get_transaddr(r, pd, sns, nr) == -1) { REASON_SET(reason, PFRES_TRANSLATE); goto cleanup; } if (r-log || act.log PF_LOG_MATCHES) { @@ -3428,10 +3429,12 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, virtual_type, icmp_dir); } } else { while ((ri = SLIST_FIRST(rules))) { SLIST_REMOVE_HEAD(rules, entry); + if (ri-r-rule_flag PFRULE_ONCE) + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); pool_put(pf_rule_item_pl, ri); } } /* copy back packet headers if needed */ @@ -3454,10 +3457,23 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, } #endif if (r-rule_flag PFRULE_ONCE) pf_purge_rule(ruleset, r, aruleset, a); + if (*sm) { + SLIST_FOREACH(ri, (*sm)-match_rules, entry) { + if (ri-r-rule_flag PFRULE_ONCE) + /* + * We can be sure that pf_purge_rule won't + * pool_put the rule because when *sm != NULL + * STATE_INC_COUNTERS has increased states_cur. + * pf_rule_item's and rules will be g/c'ed by + * pf_free_state. + */ + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); + } + } #if INET INET6 if (rewrite skw-af != sks-af) return (PF_AFRT); #endif /* INET INET6 */ diff --git sys/net/pfvar.h sys/net/pfvar.h index a0d94f7..49af7b4 100644 --- sys/net/pfvar.h +++ sys/net/pfvar.h @@ -691,10 +691,13 @@ struct pf_threshold { }; struct pf_rule_item { SLIST_ENTRY(pf_rule_item)entry; struct pf_rule *r; + struct pf_rule *ar; + struct pf_ruleset *rs; + struct pf_ruleset *ars; }; SLIST_HEAD(pf_rule_slist, pf_rule_item); enum pf_sn_types { PF_SN_NONE, PF_SN_NAT, PF_SN_RDR, PF_SN_ROUTE, PF_SN_MAX };
pf: fixup pf_step_into_anchor to save current anchor rule pointer (2)
Hi, This is a second diff and it makes sure that pf_step_into_anchor always saves a pointer to the rule that owns the anchor on the pf anchor stack. There's no reason why we should check for depth here. As a side effect this makes sure that the correct nested anchor gets it's counter bumped instead of the top most. For the save/restore symmetry pf_step_out_of_anchor is made to always restore previous value of the anchor rule. depth == 0 means what we a at the top (main ruleset). OK? diff --git sys/net/pf.c sys/net/pf.c index 9832e47..8708417 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -2666,11 +2666,11 @@ pf_step_into_anchor(int *depth, struct pf_ruleset **rs, if (*depth = sizeof(pf_anchor_stack) / sizeof(pf_anchor_stack[0])) { log(LOG_ERR, pf_step_into_anchor: stack overflow\n); *r = TAILQ_NEXT(*r, entries); return; - } else if (*depth == 0 a != NULL) + } else if (a != NULL) *a = *r; f = pf_anchor_stack + (*depth)++; f-rs = *rs; f-r = *r; if ((*r)-anchor_wildcard) { @@ -2711,10 +2711,12 @@ pf_step_out_of_anchor(int *depth, struct pf_ruleset **rs, } } (*depth)--; if (*depth == 0 a != NULL) *a = NULL; + else if (a != NULL) + *a = f-r; *rs = f-rs; if (*match *depth) { *match = *depth; if (f-r-quick) quick = 1;
pf: once for match rules?
Hi, Before I send a diff for pfctl to disable once on match rules, I've decided to try and see how much work is it to make it actually work. Turns out that I need to extend pf_rule_item by 3 pointers to track the match rule ruleset, anchor rule and the ruleset it belongs to. Here's what this means in practice. Consider a ruleset: block drop all match out log proto tcp to port 22 once anchor foo all { match out log proto tcp to port 22 once anchor bar all { match out log proto tcp to port 22 once pass out quick proto tcp to port 22 once } } Once we send a packet to port 22 the ruleset collapses to just: block drop all Thoughts? diff --git sys/net/pf.c sys/net/pf.c index 9f0e2d6..5679a40 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -3279,15 +3279,16 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, PR_NOWAIT)) == NULL) { REASON_SET(reason, PFRES_MEMORY); goto cleanup; } ri-r = r; + ri-ar = a; + ri-rs = ruleset; + ri-ars = aruleset; /* order is irrelevant */ SLIST_INSERT_HEAD(rules, ri, entry); pf_rule_to_actions(r, act); - if (r-rule_flag PFRULE_AFTO) - pd-naf = r-naf; if (pf_get_transaddr(r, pd, sns, nr) == -1) { REASON_SET(reason, PFRES_TRANSLATE); goto cleanup; } if (r-log || act.log PF_LOG_MATCHES) { @@ -3428,10 +3429,12 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, virtual_type, icmp_dir); } } else { while ((ri = SLIST_FIRST(rules))) { SLIST_REMOVE_HEAD(rules, entry); + if (ri-r-rule_flag PFRULE_ONCE) + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); pool_put(pf_rule_item_pl, ri); } } /* copy back packet headers if needed */ @@ -3454,10 +3457,23 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, } #endif if (r-rule_flag PFRULE_ONCE) pf_purge_rule(ruleset, r, aruleset, a); + if (*sm) { + SLIST_FOREACH(ri, (*sm)-match_rules, entry) { + if (ri-r-rule_flag PFRULE_ONCE) + /* +* We can be sure that pf_purge_rule won't +* pool_put the rule because when *sm != NULL +* STATE_INC_COUNTERS has increased states_cur. +* pf_rule_item's and rules will be g/c'ed by +* pf_free_state. +*/ + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); + } + } #if INET INET6 if (rewrite skw-af != sks-af) return (PF_AFRT); #endif /* INET INET6 */ diff --git sys/net/pfvar.h sys/net/pfvar.h index a0d94f7..49af7b4 100644 --- sys/net/pfvar.h +++ sys/net/pfvar.h @@ -691,10 +691,13 @@ struct pf_threshold { }; struct pf_rule_item { SLIST_ENTRY(pf_rule_item)entry; struct pf_rule *r; + struct pf_rule *ar; + struct pf_ruleset *rs; + struct pf_ruleset *ars; }; SLIST_HEAD(pf_rule_slist, pf_rule_item); enum pf_sn_types { PF_SN_NONE, PF_SN_NAT, PF_SN_RDR, PF_SN_ROUTE, PF_SN_MAX };
Re: [PATCH] rdomain support on rc.d
On 11 July 2014 10:29, Antoine Jacoutot ajacou...@bsdfrog.org wrote: On Thu, Jul 10, 2014 at 06:51:01PM +0200, Loďc BLOT wrote: Hello all, I use rdomains to split routing domains per company and also separate administration interfaces from routing interfaces on my routers (sshd, bacula, postfix and puppetd running on a dedicated rdomain) Actually there is a problem with rdomains, we need to modify /etc/rc.d scripts to add rdomain execution environment to the specified service. If rc.subr have support to rdomains, we can let the rc.d scripts clean. To resolve those rdomain issues, I created a patch and I added a new variable we could use on rc.conf(.local), ${_name}_rdomain. (This variable needs a signed integer and use an existing rdomain, this is checked by rc.subr. I want to contribute to OpenBSD and I give you this patch. If you have any suggestions to improve it, tell me. I don't use rdomain so someone knowledgeable should comment here. But it does look like a nice idea. having something like this would be really cool. in case you'll be tweaking the code, make sure that the route -T exec printf check is preserved. i would use true in this test however. as far as i can tell the daemon_rdomain bit that goes into the rc script is fine, however i'm not quite sure how can i start two daemons in different rdomains via rc.conf.local. looks like this diff doesn't handle this and allows only one instance in the ${_name}_rdomain rdomain. but sometimes you want multiple, say sshd in rdomain 0 and 1. daemon_rdomain flag allows me to go and create another rc.d/sshd-rdomain-1 script and stuff daemon_rdomain=1 in there. but then i'd have to add it to the pkg_scripts... this is a minor issue that i see. perhaps ${_name}_rdomain should list multiple values, like sshd_rdomain=0,1,2,3.
Re: PF Once rules are not removed from main anchor
On 21 June 2014 15:36, Alexandr Nedvedicky alexandr.nedvedi...@oracle.com wrote: Hello, I'm not sure it is the right place to submit patches. Let me know if there is better/more appropriate address for this. during our testing we've found the once rules are not removed, when used in main anchor. Correct. I've addedd the ruleset pointer check to prevent that shortly before the release since it caused panics... during debugging we found the rules in main anchor have member anchor set to NULL (pf_rule::anchor). This makes pf_purge_rule() function to bail out to early without removing the rule from ruleset. patch below fixed problem for us. However, this solution is not correct for us. Perhaps you have some other changes in your tree to make it work. Index: pf_ioctl.c === RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.272 diff -u -r1.272 pf_ioctl.c --- pf_ioctl.c 22 Apr 2014 14:41:03 - 1.272 +++ pf_ioctl.c 20 Jun 2014 14:26:22 - @@ -312,7 +312,7 @@ { u_int32_tnr; - if (ruleset == NULL || ruleset-anchor == NULL) + if (ruleset == NULL) return; ruleset-anchor is useless since nothing really checks for it if ruleset is NULL. pf_rm_rule(ruleset-rules.active.ptr, rule); @@ -325,7 +325,10 @@ ruleset-rules.active.ticket++; pf_calc_skip_steps(ruleset-rules.active.ptr); - pf_remove_if_empty_ruleset(ruleset); + + if (ruleset != pf_main_ruleset) { + pf_remove_if_empty_ruleset(ruleset); + } } You don't need to check ruleset against pf_main_ruleset since the first thing pf_remove_if_empty_ruleset does is bail if ruleset == pf_main_ruleset. This bit confused me quite a bit. What really is a problem for us is that when pf_purge_rule is called on a main ruleset from pf_test_rule the first argument (ruleset) is NULL and not pf_main_ruleset, which would be sensible. The only other user of it, AFAICT, is pflog_packet but it checks for ruleset-anchor before it does it's thing. I don't see any reason why we can't start our rule set iteration with 'ruleset' set to pf_main_ruleset (regress tests agree). OK? diff --git sys/net/pf.c sys/net/pf.c index 71f85d1..562901d 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -3163,10 +3163,11 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, } break; #endif /* INET6 */ } + ruleset = pf_main_ruleset; r = TAILQ_FIRST(pf_main_ruleset.rules.active.ptr); while (r != NULL) { r-evaluations++; PF_TEST_ATTRIB((pfi_kif_match(r-kif, pd-kif) == r-ifnot), r-skip[PF_SKIP_IFP].ptr); diff --git sys/net/pf_ioctl.c sys/net/pf_ioctl.c index 5ad762c..86e987f 100644 --- sys/net/pf_ioctl.c +++ sys/net/pf_ioctl.c @@ -308,24 +308,19 @@ pf_rm_rule(struct pf_rulequeue *rulequeue, struct pf_rule *rule) } void pf_purge_rule(struct pf_ruleset *ruleset, struct pf_rule *rule) { - u_int32_tnr; + u_int32_tnr = 0; - if (ruleset == NULL || ruleset-anchor == NULL) - return; + KASSERT(ruleset != NULL rule != NULL); pf_rm_rule(ruleset-rules.active.ptr, rule); ruleset-rules.active.rcount--; - - nr = 0; TAILQ_FOREACH(rule, ruleset-rules.active.ptr, entries) rule-nr = nr++; - ruleset-rules.active.ticket++; - pf_calc_skip_steps(ruleset-rules.active.ptr); pf_remove_if_empty_ruleset(ruleset); } u_int16_t
Re: ANONCVS MIRROR MAINTAINERS.. YOU NEED TO READ THIS!
On 26 June 2014 08:53, patrick keshishian sids...@boxsoft.com wrote: On Wed, Jun 25, 2014 at 10:01:06PM -0700, patrick keshishian wrote: On Thu, Jun 26, 2014 at 06:37:00AM +0200, Alexander Hall wrote: On 06/25/14 20:52, Bob Beck wrote: If you or someone you love runs an anoncvs server, they need to see this. As you know we recently added commitid support to cvs, and we had you update your cvsync binary. Unfortunately, the fix wasn't quite right. We ran into problems with the synching of commitid files. naddy managed to cook a correct fix for us. anoncvs.ca (the fanout machine) has been fixed - again. What you need to do is to update your cvsync to the latest one that was just committed into ports (cvsync-0.25.0pre0 or later). Remove your scanfile (if any). Once you do that, re-run cvsync and you should be back in business. Is it known whether this requires a fixed upstream cvsync first, or if it is ok to apply the fix and your repo will become fine whenever the upstream is fixed? Anyone with insight, please feel free to chime in. The commit message suggests, at least, the server must be updated: | Add full support for commitid and bump protocol version | Old clients will receive updates with commitid stripped out. It seems, running updated cvsync against an un-updated cvsyncd (I assume) results in failure: Connecting to anoncvs1.usa.openbsd.org port Connected to 149.20.54.217 port Running... Updating (collection openbsd-src/rcs) Updater(RCS): UPDATE(delta): error Updater(RCS): UPDATE: /cvs/anoncvs/cvs/obsd/src/lib/libc/string/timingsafe_bcmp.3,v Updater: RCS Error Socket Error: recv: 2 residue 2 Receiver Error: recv Mux(SEND) Error: not running: 0 DirScan: RCS Error Mux(SEND) Error: not running: 1 FileScan(RCS): UPDATE /cvs/anoncvs/cvs/obsd/src/sbin/pfctl/parse.y,v FileScan: RCS Error Failed removing offending files locally and restarting cvsync helps.
pfctl: better af-to translation specs handling
Hi, collapse_redirspec is one of my pet peeve since the af-to support. Unfortunately we didn't put much effort in making it work well back then, but now it is time for a change! Improving upon the last diff here's a collapsed version of the collapse_redirspec, so to speak. Instead of having two loops: first counts addresses, the second one performs an action, here's a merged version that has all the checks done once and in the right order. No changes in the regress test output (all tests pass) and the gain is that we no longer need to explicitely set the address family for af-to rules, it will be deduced in most of the cases. For example, previously you'd have to do: pass in inet af-to inet6 from ::1 While it's clear that you're translating IPv4 to the IPv6. Now it's possible to omit the 'inet' part: pass in af-to inet6 from ::1 Which is in line with how nat-to and rdr-to target addresses affect selection of the address family for the whole rule, i.e.: % echo 'pass in rdr-to ::1' | pfctl -vnf - pass in inet6 all flags S/SA rdr-to ::1 (Of course af-to rule still needs the 'inet6' part, but that will require some other changes as well...) OK? P.S. This diff moves chunks around quite a bit and might require applying to tell the difference. diff --git sbin/pfctl/parse.y sbin/pfctl/parse.y index ce80f8e..fabe210 100644 --- sbin/pfctl/parse.y +++ sbin/pfctl/parse.y @@ -2025,26 +2025,21 @@ filter_opt : USER uids { filter_opts.nat.rdr = $2; memcpy(filter_opts.nat.pool_opts, $3, sizeof(filter_opts.nat.pool_opts)); } | AFTO af FROM redirpool pool_opts { if (filter_opts.nat.rdr) { yyerror(cannot respecify af-to); YYERROR; } if ($2 == 0) { - yyerror(no address family specified); - YYERROR; - } - if ($4-host-af $4-host-af != $2) { - yyerror(af-to addresses must be in the - target address family); + yyerror(no target address family specified); YYERROR; } filter_opts.nat.af = $2; filter_opts.nat.rdr = $4; memcpy(filter_opts.nat.pool_opts, $5, sizeof(filter_opts.nat.pool_opts)); filter_opts.rdr.rdr = calloc(1, sizeof(struct redirection)); bzero(filter_opts.rdr.pool_opts, sizeof(filter_opts.rdr.pool_opts)); @@ -3482,21 +3477,21 @@ redirspec : host optweight{ n-weight = $2; } $$ = $1; } | '{' optnl redir_host_list '}' { $$ = $3; } ; redir_host_list: host optweight optnl { if ($1-addr.type != PF_ADDR_ADDRMASK) { free($1); - yyerror(only addresses can be listed for + yyerror(only addresses can be listed for redirection pools ); YYERROR; } if ($2 0) { struct node_host*n; for (n = $1; n != NULL; n = n-next) n-weight = $2; } $$ = $1; } @@ -4288,118 +4283,139 @@ expand_queue(char *qname, struct node_if *interfaces, struct queue_opts *opts) FREE_LIST(struct node_if, interfaces); return (0); } int collapse_redirspec(struct pf_pool *rpool, struct pf_rule *r, struct redirspec *rs, u_int8_t allow_if) { struct pf_opt_tbl *tbl = NULL; - struct node_host *h; + struct node_host *h, *hprev = NULL; struct pf_rule_addr ra; - int af = 0, i = 0; + int af = 0, naddr = 0; if (!rs || !rs-rdr || rs-rdr-host == NULL) { rpool-addr.type = PF_ADDR_NONE; return (0); } if (r-rule_flag PFRULE_AFTO) r-naf = rs-af; - /* count matching addresses */ for (h = rs-rdr-host; h != NULL; h = h-next) { - if (h-af) { - /* check that af-to target address has correct af */ - if (rs-af rs-af != h-af) { - yyerror(af mismatch in %s spec, - allow_if ? routing : translation); -
Re: pfctl: stricter redirect specs
On Tue, Jun 24, 2014 at 15:07 +0200, Mike Belopuhov wrote: I have carefully tested that and do not expect any unrelated fallout. And for the reasons stated above I don't believe anyone is using this since it's largely error prone. and a regress chunk that avoids using such combination. note that this diff basically finishes previous attemts at modifying tests to match present syntax. diff --git regress/sbin/pfctl/pf48.in regress/sbin/pfctl/pf48.in index 540711a..a0dd143 100644 --- regress/sbin/pfctl/pf48.in +++ regress/sbin/pfctl/pf48.in @@ -1,12 +1,12 @@ table regress { 1.2.3.4 !5.6.7.8 10/8 lo0 } table regress.1 const { ::1 fe80::/64 } table regress.a { 1.2.3.4 !5.6.7.8 } { ::1 ::2 ::3 } file /dev/null const { 4.3.2.1 } -match out on lo0 from regress.1 to regress.2 nat-to lo0:0 +match out on lo0 inet from regress.1 to regress.2 nat-to lo0:0 match out on !lo0 inet from !regress.1 to regress.2 nat-to lo0:0 match in on lo0 inet6 from regress.1 to regress.2 rdr-to lo0:0 -match in on !lo0 from ! regress.1 to regress.2 rdr-to lo0:0 +match in on !lo0 inet6 from ! regress.1 to regress.2 rdr-to lo0:0 match in from { regress.1 !regress.2 } to any match out from any to { !regress.1, regress.2 } pass in from regress to any pass out from any to regress pass in from { regress.1 regress.2 } to any diff --git regress/sbin/pfctl/pf48.loaded regress/sbin/pfctl/pf48.loaded index 68b9854..cfdc8a2 100644 --- regress/sbin/pfctl/pf48.loaded +++ regress/sbin/pfctl/pf48.loaded @@ -1,7 +1,7 @@ -@0 match out on lo0 inet6 from regress.1:2 to regress.2:* nat-to ::1 - [ Skip steps: d=2 p=end da=4 sp=end dp=end ] +@0 match out on lo0 inet from regress.1:2 to regress.2:* nat-to 127.0.0.1 + [ Skip steps: d=2 f=2 p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] [ Evaluations: 0 Packets: 0 Bytes: 0 States: 0 ] @1 match out on ! lo0 inet from ! regress.1:2 to regress.2:* nat-to 127.0.0.1 [ Skip steps: p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] diff --git regress/sbin/pfctl/pf48.ok regress/sbin/pfctl/pf48.ok index 5f6e6ab..aeccfcf 100644 --- regress/sbin/pfctl/pf48.ok +++ regress/sbin/pfctl/pf48.ok @@ -1,9 +1,9 @@ table regress { 1.2.3.4 !5.6.7.8 10.0.0.0/8 ::1 fe80::1 127.0.0.1 } table regress.1 const { ::1 fe80::/64 } table regress.a const { 1.2.3.4 !5.6.7.8 ::1 ::2 ::3 } file /dev/null { 4.3.2.1 } -match out on lo0 inet6 from regress.1 to regress.2 nat-to ::1 +match out on lo0 inet from regress.1 to regress.2 nat-to 127.0.0.1 match out on ! lo0 inet from ! regress.1 to regress.2 nat-to 127.0.0.1 match in on lo0 inet6 from regress.1 to regress.2 rdr-to ::1 match in on ! lo0 inet6 from ! regress.1 to regress.2 rdr-to ::1 match in from regress.1 to any match in from ! regress.2 to any diff --git regress/sbin/pfctl/pf48.optimized regress/sbin/pfctl/pf48.optimized index 63309a7..90e2036 100644 --- regress/sbin/pfctl/pf48.optimized +++ regress/sbin/pfctl/pf48.optimized @@ -1,7 +1,7 @@ -@0 match out on lo0 inet6 from regress.1:2 to regress.2:* nat-to ::1 - [ Skip steps: d=2 p=end da=4 sp=end dp=end ] +@0 match out on lo0 inet from regress.1:2 to regress.2:* nat-to 127.0.0.1 + [ Skip steps: d=2 f=2 p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] [ Evaluations: 0 Packets: 0 Bytes: 0 States: 0 ] @1 match out on ! lo0 inet from ! regress.1:2 to regress.2:* nat-to 127.0.0.1 [ Skip steps: p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] diff --git regress/sbin/pfctl/pf98.in regress/sbin/pfctl/pf98.in index a18a491..a224f9e 100644 --- regress/sbin/pfctl/pf98.in +++ regress/sbin/pfctl/pf98.in @@ -1,4 +1,4 @@ # Test rule order processing should pass (require-order no longer required) pass in on lo100 all -match out on lo0 all nat-to lo0 +match out on lo0 inet6 all nat-to lo0
Re: pf anchor references
On Mon, Jun 02, 2014 at 17:51 +0200, Mike Belopuhov wrote: Hi, I've been chasing some bugs in the pfctl anchor code for a couple of weeks and I'm not astonished at how loose the handling is in general. Lot's of rules and checks are being violated by some code paths while honoured by others. The problem I have is that it's truly a rabbit's hole: almost every bug I hit is a feature in disguise. One thing that I particularly hate is called an anchor reference. A good example of this is a ruleset like this: anchor foo { block all } anchor foo If you believe it should be a syntax error let me disappoint you. The second 'anchor foo' is just a reference to the anchor named foo defined before: # echo -e 'anchor foo {\n block all\n}\nanchor foo' | pfctl -f - # pfctl -a '*' -sr anchor foo all { block drop all } anchor foo all { block drop all } # echo -e 'pass all' | pfctl -a 'foo' -f - # pfctl -a '*' -sr anchor foo all { pass all flags S/SA } anchor foo all { pass all flags S/SA } And to demonstrate a reference specified by an absolute path we can add anchor /foo inside foo: # echo -e 'anchor /foo' | pfctl -a 'foo' -f - This obviously gets us an endless loop if we decide to print out contents recursively (pfctl -a '*' -sr). Thankfully this doesn't affect ruleset evaluation in the kernel. The basic question I have is why do we need this dangerous and (at least in my eyes) pretty useless mechanism? Any opinions on this? Does anyone make any sensible use of it? Should we try to simplify this by removing ambiguous functionality? Cheers, Mike While I've got a few responses on tech supporting me, I should probably ask misc@ as well for additional scrutiny. And besides, I've come up with an example that might simplify ruleset creation for say multiple customer vlans or rdomains. Does anyone use it like that? This looks like an anchor template that we want to specify but not use directly in the main ruleset, but it plays somewhat nicely with quick anchors. (The ruleset below was tested and works correctly) Cheers, Mike block all anchor customer1 quick on rdomain 1 { anchor /allow-egress anchor /allow-ssh anchor /allow-icmp-echo } anchor customer2 quick on rdomain 2 { anchor /allow-egress anchor /allow-ssh } pass in quick on rdomain 0 proto tcp to (self) port ssh pass out quick on rdomain 0 # end of ruleset evaluation block quick anchor allow-ssh { pass in proto tcp to (self) port ssh } anchor allow-icmp-echo { pass in inet proto icmp to (self) icmp-type echoreq code 0 } anchor allow-egress { pass out }
Re: Remove a global variable in ip_input
On 4 June 2014 12:30, Martin Pieuchot mpieuc...@nolizard.org wrote: ok? sure
Re: in_pcbbind() and in_broadcast/in_iawithaddr
On 3 June 2014 09:18, Martin Pieuchot mpieuc...@nolizard.org wrote: On 02/06/14(Mon) 15:45, Martin Pieuchot wrote: This diff is similar to the one that has been committed to handle the SOCK_RAW binding. I'd like to stop using in_iawithaddr() *and* in_broadcast(). Since these functions are just doing an iteration on all the addresses present in the RB-tree (or equivalent), let's use ifa_ifwithaddr() instead. This diff should not introduce any behavior change concerning SOCK_DGRAM and binding to multicast addresses. As pointed out by jca@ this diff breaks on loopback. This is because the loopback IPv4 addresses are abusing the dstaddr field to be able to create a route to their address. Hopefully this hack can be removed once the local route diff is in, but in the meantime let's use the less intuitive but equivalent idiom: sin-sin_addr.s_addr != ia-ia_addr.s_addr Update diff below, is this one ok? OK
Re: PRU_BIND in raw ip
On 28 May 2014 13:36, Martin Pieuchot mpieuc...@nolizard.org wrote: On 28/05/14(Wed) 09:30, Jérémie Courrèges-Anglas wrote: Martin Pieuchot mpieuc...@nolizard.org writes: Diff below replace in_iawithaddr() + in_broadcast() - ifa_ifwithaddr(), that does the same for IPv4 since broadcast addresses are added to the tree. This prevents listeners to bind on 255.255.255.255, something allowed with the current code. Thoughts? You're right, here's an updated diff that keeps this behavior. OK
Re: Remove p2p loopback hack in nd6_rtrequest()
On Mon, May 12, 2014 at 12:48 +0200, Martin Pieuchot wrote: On 07/05/14(Wed) 12:46, Martin Pieuchot wrote: Diff below stops abusing nd6_rtrequest() for loopback interfaces, which means we can remove the special hack below and reduce the differences with arp_rtrequest(). This diff introduces two changes in the inet6 routing table, but they should not matter. The first one is that the gateway of the link-local entry for loopback interfaces is no longer a buggy link-layer address: -fe80::1%lo0link#6 UHL 00 - 4 lo0 +fe80::1%lo0fe80::1%lo0UHL 00 - 4 lo0 The second one is that every route to network associated to the loopback interface will now have the mtu of this interface: -ff02::/16 ::1UGRS 00 - 8 lo0 +ff02::/16 ::1UGRS 00 33192 8 lo0 Both changes are consistent with the IPv4 behavior, ok? Anyone? OK Index: netinet6/in6.c === RCS file: /home/ncvs/src/sys/netinet6/in6.c,v retrieving revision 1.136 diff -u -p -r1.136 in6.c --- netinet6/in6.c 5 May 2014 11:44:33 - 1.136 +++ netinet6/in6.c 7 May 2014 10:29:24 - @@ -1399,7 +1399,7 @@ in6_ifinit(struct ifnet *ifp, struct in6 /* Add ownaddr as loopback rtentry, if necessary (ex. on p2p link). */ if (newhost) { /* set the rtrequest function to create llinfo */ - if ((ifp-if_flags IFF_POINTOPOINT) == 0) + if ((ifp-if_flags (IFF_LOOPBACK | IFF_POINTOPOINT)) == 0) ia6-ia_ifa.ifa_rtrequest = nd6_rtrequest; rt_ifa_addloop((ia6-ia_ifa)); Index: netinet6/nd6.c === RCS file: /home/ncvs/src/sys/netinet6/nd6.c,v retrieving revision 1.116 diff -u -p -r1.116 nd6.c --- netinet6/nd6.c 7 May 2014 08:14:59 - 1.116 +++ netinet6/nd6.c 7 May 2014 10:29:24 - @@ -1059,20 +1059,14 @@ nd6_rtrequest(int req, struct rtentry *r #endif /* FALLTHROUGH */ case RTM_RESOLVE: - if ((ifp-if_flags (IFF_POINTOPOINT | IFF_LOOPBACK)) == 0) { - /* -* Address resolution isn't necessary for a point to -* point link, so we can skip this test for a p2p link. -*/ - if (gate-sa_family != AF_LINK || - gate-sa_len sizeof(null_sdl)) { - log(LOG_DEBUG, %s: bad gateway value: %s\n, - __func__, ifp-if_xname); - break; - } - SDL(gate)-sdl_type = ifp-if_type; - SDL(gate)-sdl_index = ifp-if_index; + if (gate-sa_family != AF_LINK || + gate-sa_len sizeof(null_sdl)) { + log(LOG_DEBUG, %s: bad gateway value: %s\n, + __func__, ifp-if_xname); + break; } + SDL(gate)-sdl_type = ifp-if_type; + SDL(gate)-sdl_index = ifp-if_index; if (ln != NULL) break; /* This happens on a route change */ /*
Re: snmpd: add backend from bgpd to support multiple routing tables
On Mon, Apr 28, 2014 at 14:20 +0200, Mike Belopuhov wrote: This adds ktable code from bgpd to fetch, store and perform lookups in multiple routing tables. Currently it doesn't do anything useful but it's a prerequisite for any future work in this direction. OK to get this in? Any objections? diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c index e157b25..e19f924 100644 --- usr.sbin/snmpd/kroute.c +++ usr.sbin/snmpd/kroute.c @@ -45,10 +45,13 @@ #include snmpd.h extern struct snmpd *env; +struct ktable**krt; +u_int krt_size; + struct { struct event ks_ev; u_long ks_iflastchange; u_long ks_nroutes;/* 4 billions enough? */ int ks_fd; @@ -77,24 +80,32 @@ struct kif_node { int kroute_compare(struct kroute_node *, struct kroute_node *); int kroute6_compare(struct kroute6_node *, struct kroute6_node *); int kif_compare(struct kif_node *, struct kif_node *); -struct kroute_node *kroute_find(in_addr_t, u_int8_t, u_int8_t); +void ktable_init(void); +int ktable_new(u_int, u_int); +void ktable_free(u_int); +int ktable_exists(u_int, u_int *); +struct ktable*ktable_get(u_int); +int ktable_update(u_int); + +struct kroute_node *kroute_find(struct ktable *, in_addr_t, u_int8_t, + u_int8_t); struct kroute_node *kroute_matchgw(struct kroute_node *, struct sockaddr_in *); -int kroute_insert(struct kroute_node *); -int kroute_remove(struct kroute_node *); -void kroute_clear(void); +int kroute_insert(struct ktable *, struct kroute_node *); +int kroute_remove(struct ktable *, struct kroute_node *); +void kroute_clear(struct ktable *); -struct kroute6_node *kroute6_find(const struct in6_addr *, u_int8_t, - u_int8_t); +struct kroute6_node *kroute6_find(struct ktable *, const struct in6_addr *, + u_int8_t, u_int8_t); struct kroute6_node *kroute6_matchgw(struct kroute6_node *, struct sockaddr_in6 *); -int kroute6_insert(struct kroute6_node *); -int kroute6_remove(struct kroute6_node *); -void kroute6_clear(void); +int kroute6_insert(struct ktable *, struct kroute6_node *); +int kroute6_remove(struct ktable *, struct kroute6_node *); +void kroute6_clear(struct ktable *); struct kif_arp *karp_find(struct sockaddr *, u_short); int karp_insert(struct kif_node *, struct kif_arp *); int karp_remove(struct kif_node *, struct kif_arp *); @@ -121,23 +132,21 @@ voidif_newaddr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_deladdr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_announce(void *); -int fetchtable(void); +int fetchtable(struct ktable *); int fetchifs(u_short); -int fetcharp(void); +int fetcharp(struct ktable *); void dispatch_rtmsg(int, short, void *); int rtmsg_process(char *, int); -int dispatch_rtmsg_addr(struct rt_msghdr *, +int dispatch_rtmsg_addr(struct ktable *, struct rt_msghdr *, struct sockaddr *[RTAX_MAX]); -RB_HEAD(kroute_tree, kroute_node)krt; RB_PROTOTYPE(kroute_tree, kroute_node, entry, kroute_compare) RB_GENERATE(kroute_tree, kroute_node, entry, kroute_compare) -RB_HEAD(kroute6_tree, kroute6_node) krt6; RB_PROTOTYPE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_GENERATE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_HEAD(kif_tree, kif_node) kit; RB_PROTOTYPE(kif_tree, kif_node, entry, kif_compare) @@ -149,10 +158,11 @@ RB_GENERATE(ka_tree, kif_addr, node, ka_compare) void kr_init(void) { int opt = 0, rcvbuf, default_rcvbuf; + unsigned inttid = RTABLE_ANY; socklen_t optlen; if ((kr_state.ks_ifd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) fatal(kr_init: ioctl socket); @@ -179,31 +189,166 @@ kr_init(void) setsockopt(kr_state.ks_fd, SOL_SOCKET, SO_RCVBUF, rcvbuf, sizeof(rcvbuf)) == -1 errno == ENOBUFS; rcvbuf /= 2) ; /* nothing */ - RB_INIT(krt); - RB_INIT(krt6); + if (setsockopt(kr_state.ks_fd, AF_ROUTE, ROUTE_TABLEFILTER, tid, + sizeof(tid)) == -1) + log_warn(kr_init: setsockopt AF_ROUTE
Re: MI MTU size for lo(4)
On 13 May 2014 15:45, Claudio Jeker cje...@diehard.n-r-g.com wrote: With KAME the MTU size of the loopback interface became strange and is actually dependend on the architecture. I see no point in all this just go back to the way it was long long long ago and just use 32k as the MTU. AFAIK all of this was only done to test large IPv6 packets but why MHLEN and MLEN were added is still beyond my imagination. In fact this comes from this commit in NetBSD originally: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/if_loop.c#rev1.20 Comment says Add MHLEN + MLEN extra space to LOMTU for IP and transport headers. This doesn't make me understand why this is needed however. So unless this clues you in, I'm OK with setting it to 32k everywhere.
Re: MI MTU size for lo(4)
On 13 May 2014 16:05, Mike Belopuhov m...@belopuhov.com wrote: On 13 May 2014 15:45, Claudio Jeker cje...@diehard.n-r-g.com wrote: With KAME the MTU size of the loopback interface became strange and is actually dependend on the architecture. I see no point in all this just go back to the way it was long long long ago and just use 32k as the MTU. AFAIK all of this was only done to test large IPv6 packets but why MHLEN and MLEN were added is still beyond my imagination. In fact this comes from this commit in NetBSD originally: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/if_loop.c#rev1.20 Comment says Add MHLEN + MLEN extra space to LOMTU for IP and transport headers. This doesn't make me understand why this is needed however. So unless this clues you in, I'm OK with setting it to 32k everywhere. And purely for your amusement: FreeBSD have decreased MTU from 64k to 16k in '95: http://svnweb.freebsd.org/base/head/sys/net/if_loop.c?r1=6875r2=6876 Perhaps (and I'm somewhat speculating here) because of TCP performance problems, like so: http://marc.info/?l=netbsd-current-usersm=102684186303106w=2 (fixed diff is in the next mail though)
Re: m-m_pkthdr.rcvif and ip6_input()
On 12 May 2014 15:12, Martin Pieuchot mpieuc...@nolizard.org wrote: Like the previous diffs, it reduces the number of m-m_pkthdr.rcvif occurrences, this time in ip6_input(). Should be no functional change. Ok? OK
Re: Annoying emacs variable in if_spppsubr.c
On 2 May 2014 12:09, Jérémie Courrèges-Anglas j...@wxcvbn.org wrote: This one is bugging me each time I start my Emacs session (because Emacs now asks confirmation for most variables). This one would be useful only with hilit19.el (obsolete) from editors/emacs21... if the size of the file was actually under 12 bytes (it's not anymore). ok to kill it? yes, please.
Re: data modified on freelist, tmpfs-related?
On 30 April 2014 15:55, Mark Kettenis mark.kette...@xs4all.nl wrote: Date: Wed, 30 Apr 2014 15:38:39 +0200 (CEST) From: Mark Kettenis mark.kette...@xs4all.nl Date: Wed, 30 Apr 2014 13:39:20 +0100 From: Stuart Henderson st...@openbsd.org Seen when running e2fsprogs regression tests with /tmp on tmpfs I'm not surprised; tmpfs contains some serious bugs. I recommend not using it until those are fixed. Which means, I'd like somebody else besides espie@ to comment on my uvm_aobj.c list manipulation hack diff. Diff made sense to me when I looked at it, but I would rather hide direct pointer access :/ Perhaps LIST_SWAP does a tiny bit more, but it's cleaner and perhaps can be useful in the future.
Re: data modified on freelist, tmpfs-related?
On 30 April 2014 16:55, Mark Kettenis mark.kette...@xs4all.nl wrote: From: Mike Belopuhov m...@belopuhov.com Date: Wed, 30 Apr 2014 16:00:45 +0200 On 30 April 2014 15:55, Mark Kettenis mark.kette...@xs4all.nl wrote: Date: Wed, 30 Apr 2014 15:38:39 +0200 (CEST) From: Mark Kettenis mark.kette...@xs4all.nl Date: Wed, 30 Apr 2014 13:39:20 +0100 From: Stuart Henderson st...@openbsd.org Seen when running e2fsprogs regression tests with /tmp on tmpfs I'm not surprised; tmpfs contains some serious bugs. I recommend not using it until those are fixed. Which means, I'd like somebody else besides espie@ to comment on my uvm_aobj.c list manipulation hack diff. Diff made sense to me when I looked at it, but I would rather hide direct pointer access :/ Perhaps LIST_SWAP does a tiny bit more, but it's cleaner and perhaps can be useful in the future. I'm not comfortable with introducing more sys/queue.h APIs. So perhaps we should just punt on the optimization and remove/insert all list items. Removing the trap comments that pedro set up... I agree.
snmpd: add backend from bgpd to support multiple routing tables
This adds ktable code from bgpd to fetch, store and perform lookups in multiple routing tables. Currently it doesn't do anything useful but it's a prerequisite for any future work in this direction. OK to get this in? diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c index e157b25..e19f924 100644 --- usr.sbin/snmpd/kroute.c +++ usr.sbin/snmpd/kroute.c @@ -45,10 +45,13 @@ #include snmpd.h extern struct snmpd*env; +struct ktable **krt; +u_intkrt_size; + struct { struct event ks_ev; u_long ks_iflastchange; u_long ks_nroutes;/* 4 billions enough? */ int ks_fd; @@ -77,24 +80,32 @@ struct kif_node { intkroute_compare(struct kroute_node *, struct kroute_node *); intkroute6_compare(struct kroute6_node *, struct kroute6_node *); intkif_compare(struct kif_node *, struct kif_node *); -struct kroute_node *kroute_find(in_addr_t, u_int8_t, u_int8_t); +voidktable_init(void); +int ktable_new(u_int, u_int); +voidktable_free(u_int); +int ktable_exists(u_int, u_int *); +struct ktable *ktable_get(u_int); +int ktable_update(u_int); + +struct kroute_node *kroute_find(struct ktable *, in_addr_t, u_int8_t, + u_int8_t); struct kroute_node *kroute_matchgw(struct kroute_node *, struct sockaddr_in *); -int kroute_insert(struct kroute_node *); -int kroute_remove(struct kroute_node *); -voidkroute_clear(void); +int kroute_insert(struct ktable *, struct kroute_node *); +int kroute_remove(struct ktable *, struct kroute_node *); +voidkroute_clear(struct ktable *); -struct kroute6_node*kroute6_find(const struct in6_addr *, u_int8_t, -u_int8_t); +struct kroute6_node*kroute6_find(struct ktable *, const struct in6_addr *, + u_int8_t, u_int8_t); struct kroute6_node*kroute6_matchgw(struct kroute6_node *, struct sockaddr_in6 *); -int kroute6_insert(struct kroute6_node *); -int kroute6_remove(struct kroute6_node *); -voidkroute6_clear(void); +int kroute6_insert(struct ktable *, struct kroute6_node *); +int kroute6_remove(struct ktable *, struct kroute6_node *); +voidkroute6_clear(struct ktable *); struct kif_arp *karp_find(struct sockaddr *, u_short); int karp_insert(struct kif_node *, struct kif_arp *); int karp_remove(struct kif_node *, struct kif_arp *); @@ -121,23 +132,21 @@ void if_newaddr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_deladdr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_announce(void *); -intfetchtable(void); +intfetchtable(struct ktable *); intfetchifs(u_short); -intfetcharp(void); +intfetcharp(struct ktable *); void dispatch_rtmsg(int, short, void *); intrtmsg_process(char *, int); -intdispatch_rtmsg_addr(struct rt_msghdr *, +intdispatch_rtmsg_addr(struct ktable *, struct rt_msghdr *, struct sockaddr *[RTAX_MAX]); -RB_HEAD(kroute_tree, kroute_node) krt; RB_PROTOTYPE(kroute_tree, kroute_node, entry, kroute_compare) RB_GENERATE(kroute_tree, kroute_node, entry, kroute_compare) -RB_HEAD(kroute6_tree, kroute6_node)krt6; RB_PROTOTYPE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_GENERATE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_HEAD(kif_tree, kif_node)kit; RB_PROTOTYPE(kif_tree, kif_node, entry, kif_compare) @@ -149,10 +158,11 @@ RB_GENERATE(ka_tree, kif_addr, node, ka_compare) void kr_init(void) { int opt = 0, rcvbuf, default_rcvbuf; + unsigned inttid = RTABLE_ANY; socklen_t optlen; if ((kr_state.ks_ifd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) fatal(kr_init: ioctl socket); @@ -179,31 +189,166 @@ kr_init(void) setsockopt(kr_state.ks_fd, SOL_SOCKET, SO_RCVBUF, rcvbuf, sizeof(rcvbuf)) == -1 errno == ENOBUFS; rcvbuf /= 2) ; /* nothing */ - RB_INIT(krt); - RB_INIT(krt6); + if (setsockopt(kr_state.ks_fd, AF_ROUTE, ROUTE_TABLEFILTER, tid, + sizeof(tid)) == -1) + log_warn(kr_init: setsockopt AF_ROUTE ROUTE_TABLEFILTER); + RB_INIT(kit); RB_INIT(kat); if
Re: iked + isakmpd on the same machine
On 24 April 2014 12:12, Philipp e1c1bac6253dc54a1e89ddc046585...@posteo.net wrote: Am 22.04.2014 17:28 schrieb Mike Belopuhov: more like it's not supported and is not supposed to work. not supposed as in 'not wanted'? not supposed. it's like running nginx and apache at the same time but Quite frankly: I'm doing that in some locations ;-) not on the same port (80) though. ikev2 and isakmp both use same udp ports (500 and 4500). worse since there are kernel tentacles involved as well (as you might have figured out already) that will likely That's somehow the main problem, the two daemons are not trying to share the pfkey2 ioctls outcome. i don't see it like that. So, I can wait til iked supports ikev1, too. there are no current plans to implement ikev1 support that i'm aware of. Using a different machine will be quite painful at the moment. Rock+hard place. prevent you from doing that on the same box but different ip addresses. Nevertheless I'd say that a Listen-on style directive for iked would a useful thing[tm], e.g. to default the srcid to that. perhaps. currently i believe srcid will default to local if specified. Cheers.
Re: Kill in_localaddr()
On 24 April 2014 16:41, Martin Pieuchot mpieuc...@nolizard.org wrote: in_localaddr() is used only once in our tree and only if the sysctl net.inet.ip.mtudisc is set to 0. It is used to optimize the size of the MSS if the forward address correspond to a host on one of our subnets. Since it's an optimization for a special case that's not enabled by default, I'd like to kill it to remove one usage of the global list of IPv4 addresses. While here get rid of the #ifdef RTV_MTU, it is here. ok? OK
Re: iked + isakmpd on the same machine
On 24 April 2014 20:25, Chris Cappuccio ch...@nmedia.net wrote: Mike Belopuhov [m...@belopuhov.com] wrote: more like it's not supported and is not supposed to work. it's like running nginx and apache at the same time hey, nginx and httpd run concurrently quite fine on different IP addresses, same box :) i meant using the same port numbers of course.
Re: iked + isakmpd on the same machine
On 24 April 2014 22:25, Alexander Hall alexan...@beard.se wrote: On 04/24/14 21:53, Stuart Henderson wrote: On 2014/04/24 20:30, Mike Belopuhov wrote: On 24 April 2014 20:25, Chris Cappuccio ch...@nmedia.net wrote: Mike Belopuhov [m...@belopuhov.com] wrote: more like it's not supported and is not supposed to work. it's like running nginx and apache at the same time hey, nginx and httpd run concurrently quite fine on different IP addresses, same box :) i meant using the same port numbers of course. they can do that fine too! :) just have one hand-off the relevant requests to the other. If they bind to separate IP addresses that is obviously not a problem, even for the same port numbers. yes. that's precisely what i meant: you can't bind to the same ipaddr:port pair twice. why do i have to chew it and spit it out for you. it was clear what i meant from the start.
Re: IPv6 DoS sysctl man page additions
On 19 April 2014 13:20, Loganaden Velvindron lo...@elandsys.com wrote: On Sat, Apr 19, 2014 at 04:04:30AM -0700, Loganaden Velvindron wrote: Hi All, I'm taking a short break from playing with pf statistics. There were 4 sysctls added from KAME, but the man pages weren't updated accordingly. (Adapted from the NetBSD man page changes) Feedback welcomed. Removed trailing spaces and use set to instead of set it to based on feedback from sthen@ OK mikeb
Re: iked + isakmpd on the same machine
On 22 April 2014 17:13, Philipp e1c1bac6253dc54a1e89ddc046585...@posteo.net wrote: It happened! A remote peer *requires* IKEv2 - and I've to do that on a machine running isakmpd with somewhat 25+ IKEv1 peers. First hurdle: I cannot bind iked to a certain (carp) IP-address. Mad workaround: start isakmpd (with Listen-on) first. Second hurdle: iked loads its SAs and eventually does this by creating a new empty SADB, effectivly killing all the SAs isakmpd loaded into the kernel before? Is there a diff sleeping out there for tackling the first hurdle? For the second one, I've to refrain from testing this in live further more. First to reconstruct my Frankenstein-Lab. Cheers for any thoughts beside mad, bro? :-) more like it's not supported and is not supposed to work. it's like running nginx and apache at the same time but worse since there are kernel tentacles involved as well (as you might have figured out already) that will likely prevent you from doing that on the same box but different ip addresses. cheers, mike
Re: snmpd: support for ipNetToMediaTable (ARP table exporting)
On Mon, Apr 07, 2014 at 17:03 +0200, Mike Belopuhov wrote: a bit of an update, mainly style changes. one functional change: don't rely on rtm_rmx.rmx_expire to set the F_STATIC flag as rt_getmetrics is not called consistenly (only with RTM_GETs) and besides RTF_STATIC flag is already present for static ARP entries. http://www.vantronix.net/~mike/snmpd-arp.diff I've ditched rdomain kludges to simplify the diff and because actual rdomain support doesn't need any of those. OK? diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c index 1ed4d17..e157b25 100644 --- usr.sbin/snmpd/kroute.c +++ usr.sbin/snmpd/kroute.c @@ -69,10 +69,11 @@ struct kroute6_node { }; struct kif_node { RB_ENTRY(kif_node) entry; TAILQ_HEAD(, kif_addr) addrs; + TAILQ_HEAD(, kif_arp)arps; struct kif k; }; intkroute_compare(struct kroute_node *, struct kroute_node *); intkroute6_compare(struct kroute6_node *, struct kroute6_node *); @@ -91,10 +92,14 @@ struct kroute6_node *kroute6_matchgw(struct kroute6_node *, struct sockaddr_in6 *); int kroute6_insert(struct kroute6_node *); int kroute6_remove(struct kroute6_node *); voidkroute6_clear(void); +struct kif_arp *karp_find(struct sockaddr *, u_short); +int karp_insert(struct kif_node *, struct kif_arp *); +int karp_remove(struct kif_node *, struct kif_arp *); + struct kif_node*kif_find(u_short); struct kif_node*kif_insert(u_short); int kif_remove(struct kif_node *); voidkif_clear(void); struct kif *kif_update(u_short, int, struct if_data *, @@ -118,10 +123,11 @@ void if_deladdr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_announce(void *); intfetchtable(void); intfetchifs(u_short); +intfetcharp(void); void dispatch_rtmsg(int, short, void *); intrtmsg_process(char *, int); intdispatch_rtmsg_addr(struct rt_msghdr *, struct sockaddr *[RTAX_MAX]); @@ -182,10 +188,12 @@ kr_init(void) if (fetchifs(0) == -1) fatalx(kr_init fetchifs); if (fetchtable() == -1) fatalx(kr_init fetchtable); + if (fetcharp() == -1) + fatalx(kr_init fetcharp); event_set(kr_state.ks_ev, kr_state.ks_fd, EV_READ | EV_PERSIST, dispatch_rtmsg, NULL); event_add(kr_state.ks_ev, NULL); } @@ -519,10 +527,123 @@ kroute6_clear(void) while ((kr = RB_MIN(kroute6_tree, krt6)) != NULL) kroute6_remove(kr); } +static inline int +karp_compare(struct kif_arp *a, struct kif_arp *b) +{ + /* Interface indices are assumed equal */ + if (ntohl(a-addr.sin.sin_addr.s_addr) + ntohl(b-addr.sin.sin_addr.s_addr)) + return (1); + if (ntohl(a-addr.sin.sin_addr.s_addr) + ntohl(b-addr.sin.sin_addr.s_addr)) + return (-1); + return (0); +} + +static inline struct kif_arp * +karp_search(struct kif_node *kn, struct kif_arp *ka) +{ + struct kif_arp *pivot; + + TAILQ_FOREACH(pivot, kn-arps, entry) { + switch (karp_compare(ka, pivot)) { + case 0: /* found */ + return (pivot); + case -1: /* ka pivot, end the search */ + return (NULL); + } + } + /* looped through the whole list and didn't find */ + return (NULL); +} + +struct kif_arp * +karp_find(struct sockaddr *sa, u_short ifindex) +{ + struct kif_node *kn; + struct kif_arp *ka = NULL, s; + + memcpy(s.addr.sa, sa, sa-sa_len); + + if (ifindex == 0) { + /* +* We iterate manually to handle zero ifindex special +* case differently from kif_find, in particular we +* want to look for the address on all available +* interfaces. +*/ + RB_FOREACH(kn, kif_tree, kit) { + if ((ka = karp_search(kn, s)) != NULL) + break; + } + } else { + if ((kn = kif_find(ifindex)) == NULL) + return (NULL); + ka = karp_search(kn, s); + } + return (ka); +} + +int +karp_insert(struct kif_node *kn, struct kif_arp *ka) +{ + struct kif_arp *pivot; + + if (ka-if_index == 0) + return (-1); + if (!kn (kn = kif_find(ka-if_index)) == NULL) + return (-1); + /* Put entry on the list in the ascending lexical order */ + TAILQ_FOREACH(pivot, kn-arps, entry) { + switch