Re: Small ifconfig output tweak for inet6?

2015-03-26 Thread Mike Belopuhov
On 26 March 2015 at 14:27, Stuart Henderson st...@openbsd.org wrote:
 seems reasonable. (I'd quite like that for v4 too, though it wouldn't
 cope with non-contiguous netmask ;)


non-contiguous netmasks for IPv4 addresses configured on an interface?
is that possible?  what's the use case?
perhaps you're confusing this with  non-contiguous netmasks in the radix
tree that are entered by the ipsec flows containing port numbers?

however I agree that if we do this for ipv6 we should do it for ipv4 as well
but then do we care about tons of stuff out there parsing ifconfig output?



Re: Avoid doing IPv6 SLAAC for prefixes with preferred lifetime of zero

2015-03-09 Thread Mike Belopuhov
On 9 March 2015 at 16:37, Stuart Henderson st...@openbsd.org wrote:

 whitespace nit here (5 char indent), otherwise OK

done on purpose.



Avoid doing IPv6 SLAAC for prefixes with preferred lifetime of zero

2015-03-09 Thread Mike Belopuhov
Hi,

It looks like Mac OS X puts some IPv6 garbage on the wire and
our cheap consumer router starts happily advertising routes
like this:

 4006:16e1:ac17:189::/64 if=re0
 flags=LAO vltime=6401, pltime=0, expire=1h46m34s, ref=0
   advertised by
 fe80::9ec7:a6ff:fe86:a3f4%re0 (reachable)

This makes us generate autoconf and privacy addresses which
immediately become deprecated (pltime=0 translates to a zero
preferred lifetime for these addresses).  Within minutes on
an active network you end up with hundreds of dead IPv6
addresses.

RFC 4941 says in 3.3.5:

 In particular, an implementation MUST NOT create a temporary
  address with a zero Preferred Lifetime.

Which means we have to strengthen the lifetime check.  OK?

This also raises the question whether we should memorize such
prefixes at all -- but I haven't found a straight answer just
yet.


diff --git sys/netinet6/nd6_rtr.c sys/netinet6/nd6_rtr.c
index de23248..d7f59de 100644
--- sys/netinet6/nd6_rtr.c
+++ sys/netinet6/nd6_rtr.c
@@ -1374,16 +1374,17 @@ prelist_update(struct nd_prefix *new, struct 
nd_defrouter *dr, struct mbuf *m)
ia6-ia6_lifetime = lt6_tmp;
ia6-ia6_updatetime = time_second;
}
 
if ((!autoconf || ((ifp-if_xflags  IFXF_INET6_NOPRIVACY) == 0 
-   !tempaddr_preferred))  new-ndpr_vltime != 0 
+!tempaddr_preferred)) 
+   new-ndpr_vltime != 0  new-ndpr_pltime != 0 
!((ifp-if_xflags  IFXF_INET6_NOPRIVACY)  statique)) {
/*
 * There is no SLAAC address and/or there is no preferred RFC
-* 4941 temporary address. And the valid prefix lifetime is
-* non-zero. And there is no static address in the same prefix.
+* 4941 temporary address. And prefix lifetimes are non-zero.
+* And there is no static address in the same prefix.
 * Create new addresses in process context.
 * Increment prefix refcount to ensure the prefix is not
 * removed before the task is done.
 */
pr-ndpr_refcnt++;



Re: splassert: rtrequest1: want 5 have 0

2015-02-19 Thread Mike Belopuhov
On 19 February 2015 at 21:30, Alexander Bluhm alexander.bl...@gmx.net wrote:
 On Wed, Feb 18, 2015 at 12:14:15PM +0100, Matthieu Herrb wrote:
 Feb 18 12:09:59 castor /bsd: splassert: rtrequest1: want 5 have 0
 Feb 18 12:09:59 castor /bsd: Starting stack trace...
 Feb 18 12:09:59 castor /bsd: splassert_check() at splassert_check+0x78
 Feb 18 12:09:59 castor /bsd: rtrequest1() at rtrequest1+0x5e
 Feb 18 12:09:59 castor /bsd: nd6_prefix_offlink() at
 nd6_prefix_offlink+0x1bf
 Feb 18 12:09:59 castor /bsd: pfxlist_onlink_check() at
 pfxlist_onlink_check+0x25e
 Feb 18 12:09:59 castor /bsd: in6_control() at in6_control+0x894
 Feb 18 12:09:59 castor /bsd: ifioctl() at ifioctl+0x175
 Feb 18 12:09:59 castor /bsd: sys_ioctl() at sys_ioctl+0x169
 Feb 18 12:09:59 castor /bsd: syscall() at syscall+0x297
 Feb 18 12:09:59 castor /bsd: --- syscall (number 54) ---
 Feb 18 12:09:59 castor /bsd: end of kernel
 Feb 18 12:09:59 castor /bsd: end trace frame: 0xc8115948400, count:
 249
 Feb 18 12:09:59 castor /bsd: 0xc8115715cda:
 Feb 18 12:09:59 castor /bsd: End of stack trace.
 Feb 18 12:10:00 castor /bsd: carp0: state transition: BACKUP - MASTER

 Most calls to pfxlist_onlink_check() are protected by splsoftnet.
 Only the path in your trace does not set it.  So I suggest to set
 splsoftnet() in in6_control().  I have included the dohooks() as
 this is done in IPv4.  While there I have moved some splsoftnet()
 hiding in the declarations to the beginning of the code.

 ok?

 bluhm


OK, thanks for taking a look!



Re: libpcap use after free

2015-01-15 Thread Mike Belopuhov
On 15 January 2015 at 03:53, Lawrence Teo l...@openbsd.org wrote:
 libpcap has a use after free (found via LLVM).

 pcap_close() currently looks like this:

 void
 pcap_close(pcap_t *p)
 {
 if (p-opt.source != NULL)
 free(p-opt.source);
 pcap_cleanup_bpf(p);
 free(p);
 }

 The bug affects libpcap programs that enable monitor mode on 802.11
 devices (i.e. if they call pcap_set_rfmon() followed by
 pcap_activate()).  If pcap_close() is called after that,
 pcap_cleanup_bpf() will attempt to use p-opt.source while trying to
 disable monitor mode, resulting in a use after free.

 The fix is simple (diff below).  I tested this with a small program
 that calls pcap_create(), pcap_set_rfmon(), pcap_activate(), and
 pcap_close() on an iwn(4) device with MALLOC_OPTIONS=AFGJPRX.
 With the diff applied, the test program no longer segfaults.

 ok?



Looks good to me. OK mikeb



Re: Kill IPv4 list of addresses

2015-01-12 Thread Mike Belopuhov
On 6 January 2015 at 13:26, Martin Pieuchot mpieuc...@nolizard.org wrote:
 Diff below remove the last use of the global IPv4 list of addresses.

 The code using it is a hack to move the unique cloning route of a
 subnet from one ifa to another.  I know a proper fix would be to use
 multipath for that, but this is not possible feasible right now
 because we cannot select multipath route entries based on a different
 ifa.

 In the meantime this allow us to simplify in_ifinit() which still needs
 better error handling.

 ok?


Looks good to me.



Re: idle pool page gc

2014-12-22 Thread Mike Belopuhov
On 22 December 2014 at 06:43, David Gwynne da...@gwynne.id.au wrote:
 this introduces a global gc task that loops over all the pools
 looking for pages that havent been used for a very long time so
 they can be freed.

 this is the simplest way of doing this without introducing per pool
 timeouts/tasks which in turn could introduce races with pool_destroy,
 or more shared global state that gets touched on pool_get/put ops.

 im adding the timeout in src/sys/kern/init_main.c cos having pool_init
 do it causes faults when a pool is created before the timeout
 subsystem is set up.

 i have tested this on sparc64, amd64, and hppa, and it seems to do
 the advertised job.


i have a couple of questions to understand the reasoning behind this.

1) how is it different from what we have now?

2) why can't you do it when you pool_put?

3) why can't you call it from uvm when there's memory pressure since
   from what i understand pool_reclaim was supposed to work like that?

4) i assume you don't want to call pool_reclaim from pool_gc_pages
   because of the mutex_enter_try benefits, but why does logic in
   these functions differ, e.g. why did you omit the pr_itemsperpage
   bit?

5) why 8 ticks is a very long time to free a page?
   why not 10 seconds?

6) why is there an XXX next to the splx?

7) it looks like pool_reclaim_all should also raise an IPL since it
   does the same thing.  wasn't it noteced before?



Re: divert(4) m_pullup

2014-12-16 Thread Mike Belopuhov
On 16 December 2014 at 12:08, Mark Kettenis mark.kette...@xs4all.nl wrote:
 Date: Mon, 15 Dec 2014 23:44:54 -0500
 From: Lawrence Teo l...@openbsd.org

 Make divert_output() do an m_pullup only if truly needed.

 ok?

 Questionable.  AFAIK m_pullup(9) will only do the pullup if it is
 necesary in the first place.

I agree.  m_pullup already checks that.

 Is there a measurable speedup from inlining the check?




Re: page fault at resume, possibly caused by java/jenkins

2014-12-11 Thread Mike Belopuhov
On 11 December 2014 at 14:16, Martin Pieuchot mpieuc...@nolizard.org wrote:
 On 11/12/14(Thu) 12:37, frantisek holop wrote:
   login: kernel: page fault trap, code=0
   Stopped at  in_selectsrc+0xd8:  movl 0xf0(%esi),%ebx
   ddb{0} trace
   in_selectsrc(f617cdc8,d80b6714,d35d9270,d8cfac44,d8cfac34) at 
   in_selectsrc+0xd8
  
   udp_output(d8cfabfc,d80b6900,d80b6700,0,0) at udp_output+0xf8
   sosend(d7f881c8,d80b6700,f617ce90,d80b6900,0) at sosend+0x44b
   sendit(d84f4b40,103,f617cef4,0,f617cf80) at sendit+0x1e1
   sys_sendto(d84f4b40,f617cf60,f617cf80,d0568b5a,d84f4b40) at 
   sys_sendto+0x6c
   syscall() at syscall+0x144
   --- syscall (number 259) ---
   0x2:
   ddb{0}

 run0 usb dongle + dhclient through a home router,
 with one special route added that is forwarded through
 a linux box (on the same home router network) which
 is connected to a vpn in a different country.
 it is added like this:

 # route add 192.168.10.2 $LINUX_IP

 jenkins is running a shell script, that is running
 tests on the 192.168.10.2 web site accessible only
 through the vpn.  the tests are simple selenium tests
 recorded through the firefox plugin.  a quick google
 reveals that jenkins itself might be using some
 udp/multicast functionality:
 https://wiki.jenkins-ci.org/display/JENKINS/Auto-discovering+Jenkins+on+the+network

 Could you try the diff below and let me know if it improves the
 situation?  It should prevent your kernel to panic.  Even though,
 more work might be required to have working socket w/ multicast
 using USB devices upon resume.

 What's happening is simply a use after free because there's no way to
 invalidate multicast options attached to a pcb when an interface is
 detached.  I which multicast would have been implemented using route
 entries instead of this hack...

 Change below works around this problem by using interface indexes.


Makes sense to me.  OK mikeb



Re: tcpdump follows incorrect alignment requirement rules

2014-12-09 Thread Mike Belopuhov
On 9 December 2014 at 19:00, Christian Weisgerber na...@mips.inka.de wrote:
 On 2014-12-03, Mike Belopuhov m...@belopuhov.com wrote:

 bpf aligns data following the datalink header (e.g. ethernet)
 on the BPF_ALIGNMENT boundary.  Since rev1.41 of bpf.h it's
 uint32_t instead of a long.  And also since then almost all
 packets become unaligned from the tcpdump perspective and
 require costly copies into the internal buffer.  Neither IP
 header (struct ip) nor IPv6 (struct ip6_hdr) have fields
 larger than 32 bits and therefore alignment requirements for
 them are at most 32 bit.

 I've successfully tested this diff on sparc64 and amd64.

 Oops, turns out I've had the same diff in my tree on my sparc64 for
 a long time (last year?).  No ill effects, but I do little tcpdumping
 on that machine.

 I think you have forgotten a chunk in print-atalk.c:


I did not.  I can't test AppleTalk, so if you can, please do so.
By analogy it looks correct (:



Call for testing of watchdog devices

2014-12-08 Thread Mike Belopuhov
Hi,

We plan to remove shutdown hooks for good and need to convert
all drivers implementing a watchdog(4) device to the config_*
framework namely implementing the activate method with a
DVACT_POWERDOWN action handler fleshed out.

This is the diff I came up with.  wdog_shutdown will now check
that the registered watchdog is the one we're disabling.

This compiles on amd64, i386 and sparc64, works fine on glxpcib
(the only one we have access to).

Here's a list of drivers that get changed:
 
 armv7/omap/omdog
 armv7/sunxi/sxidog

 i386/esm
 i386/elansc
 i386/geodesc

 sgi/imc

 sparc64/lom
 sparc64/pmc

 ipmi (not enabled)

 fins, it, schsio, viasio

 berkwdt, glxpcib, ichwdt, pwdog, tcpcib, wdt

Tests and OK's are welcome.


diff --git sys/arch/armv7/omap/omdog.c sys/arch/armv7/omap/omdog.c
index a8fb5d4..166bc11 100644
--- sys/arch/armv7/omap/omdog.c
+++ sys/arch/armv7/omap/omdog.c
@@ -55,18 +55,19 @@ struct omdog_softc {
 };
 
 struct omdog_softc *omdog_sc;
 
 void   omdog_attach(struct device *, struct device *, void *);
+intomdog_activate(struct device *, int);
 void   omdog_start(struct omdog_softc *);
 void   omdog_stop(struct omdog_softc *);
 void   omdog_sync(struct omdog_softc *);
 intomdog_cb(void *, int);
 void   omdog_reset(void);
 
 struct cfattachomdog_ca = {
-   sizeof (struct omdog_softc), NULL, omdog_attach
+   sizeof (struct omdog_softc), NULL, omdog_attach, NULL, omdog_activate
 };
 
 struct cfdriver omdog_cd = {
NULL, omdog, DV_DULL
 };
@@ -93,10 +94,24 @@ omdog_attach(struct device *parent, struct device *self, 
void *args)
 #ifndef SMALL_KERNEL
wdog_register(omdog_cb, sc);
 #endif
 }
 
+int
+omdog_activate(struct device *self, int act)
+{
+   switch (act) {
+   case DVACT_POWERDOWN:
+#ifndef SMALL_KERNEL
+   wdog_shutdown(self);
+#endif
+   break;
+   }
+
+   return (0);
+}
+
 void
 omdog_start(struct omdog_softc *sc)
 {
/* Write the enable sequence data h followed by h */
bus_space_write_4(sc-sc_iot, sc-sc_ioh, WSPR, 0x);
diff --git sys/arch/armv7/sunxi/sxidog.c sys/arch/armv7/sunxi/sxidog.c
index ab327dd..c59660f 100644
--- sys/arch/armv7/sunxi/sxidog.c
+++ sys/arch/armv7/sunxi/sxidog.c
@@ -64,18 +64,20 @@ struct sxidog_softc {
 };
 
 struct sxidog_softc *sxidog_sc = NULL; /* for sxidog_reset() */
 
 void sxidog_attach(struct device *, struct device *, void *);
+int sxidog_activate(struct device *, int);
 int sxidog_callback(void *, int);
 #if 0
 int sxidog_intr(void *);
 #endif
 void sxidog_reset(void);
 
 struct cfattachsxidog_ca = {
-   sizeof (struct sxidog_softc), NULL, sxidog_attach
+   sizeof (struct sxidog_softc), NULL, sxidog_attach,
+   NULL, sxidog_activate
 };
 
 struct cfdriver sxidog_cd = {
NULL, sxidog, DV_DULL
 };
diff --git sys/arch/i386/i386/esm.c sys/arch/i386/i386/esm.c
index 54f9f6d..40a278b 100644
--- sys/arch/i386/i386/esm.c
+++ sys/arch/i386/i386/esm.c
@@ -43,10 +43,11 @@ int esmdebug = 3;
 #define DPRINTFN(n,x...)   /* n: x */
 #endif
 
 intesm_match(struct device *, void *, void *);
 void   esm_attach(struct device *, struct device *, void *);
+intesm_activate(struct device *, int);
 
 enum esm_sensor_type {
ESM_S_UNKNOWN, /* XXX */
ESM_S_INTRUSION,
ESM_S_TEMP,
@@ -122,11 +123,12 @@ struct esm_softc {
int sc_wdog_period;
volatile intsc_wdog_tickle;
 };
 
 struct cfattach esm_ca = {
-   sizeof(struct esm_softc), esm_match, esm_attach
+   sizeof(struct esm_softc), esm_match, esm_attach,
+   NULL, esm_activate
 };
 
 struct cfdriver esm_cd = {
NULL, esm, DV_DULL
 };
@@ -272,10 +274,22 @@ esm_attach(struct device *parent, struct device *self, 
void *aux)
timeout_add_sec(sc-sc_timeout, 1);
}
 }
 
 int
+esm_activate(struct device *self, int act)
+{
+   switch (act) {
+   case DVACT_POWERDOWN:
+   wdog_shutdown(self);
+   break;
+   }
+
+   return (0);
+}
+
+int
 esm_watchdog(void *arg, int period)
 {
struct esm_softc*sc = arg;
struct esm_wdog_propprop;
struct esm_wdog_state   state;
diff --git sys/arch/i386/pci/elan520.c sys/arch/i386/pci/elan520.c
index bf077dd..b97476b 100644
--- sys/arch/i386/pci/elan520.c
+++ sys/arch/i386/pci/elan520.c
@@ -68,10 +68,11 @@ struct elansc_softc {
struct timecounter  sc_tc;
 } *elansc;
 
 intelansc_match(struct device *, void *, void *);
 void   elansc_attach(struct device *, struct device *, void *);
+intelansc_activate(struct device *, int);
 void   elansc_update_cpuspeed(void);
 void   elansc_setperf(int);
 intelansc_cpuspeed(int *);
 
 void   elansc_wdogctl(struct elansc_softc *, int, uint16_t);
@@ -84,11 +85,12 @@ voidelansc_gpio_pin_write(void *, int, int);
 void   elansc_gpio_pin_ctl(void *, int, int);
 
 u_int  

Re: tcpdump follows incorrect alignment requirement rules

2014-12-08 Thread Mike Belopuhov
On Wed, Dec 03, 2014 at 14:56 +0100, Mike Belopuhov wrote:
 bpf aligns data following the datalink header (e.g. ethernet)
 on the BPF_ALIGNMENT boundary.  Since rev1.41 of bpf.h it's
 uint32_t instead of a long.  And also since then almost all
 packets become unaligned from the tcpdump perspective and
 require costly copies into the internal buffer.  Neither IP
 header (struct ip) nor IPv6 (struct ip6_hdr) have fields
 larger than 32 bits and therefore alignment requirements for
 them are at most 32 bit.
 
 I've successfully tested this diff on sparc64 and amd64.
 
 OK?
 

Still looking for OKs.

 diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c
 index 1839869..60946a1 100644
 --- usr.sbin/tcpdump/print-ip.c
 +++ usr.sbin/tcpdump/print-ip.c
 @@ -368,11 +368,11 @@ ip_print(register const u_char *bp, register u_int 
 length)
   /*
* If the IP header is not aligned, copy into abuf.
* This will never happen with BPF.  It does happen with raw packet
* dumps from -r.
*/
 - if ((intptr_t)ip  (sizeof(long)-1)) {
 + if ((intptr_t)ip  (sizeof(u_int32_t)-1)) {
   static u_char *abuf = NULL;
   static int didwarn = 0;
   int clen = snapend - bp;
  
   if (clen  snaplen)
 diff --git usr.sbin/tcpdump/print-ip6.c usr.sbin/tcpdump/print-ip6.c
 index cca8495..ea3a9b3 100644
 --- usr.sbin/tcpdump/print-ip6.c
 +++ usr.sbin/tcpdump/print-ip6.c
 @@ -70,11 +70,11 @@ ip6_print(register const u_char *bp, register u_int 
 length)
   /*
* The IP header is not word aligned, so copy into abuf.
* This will never happen with BPF.  It does happen with
* raw packet dumps from -r.
*/
 - if ((intptr_t)ip6  (sizeof(long)-1)) {
 + if ((intptr_t)ip6  (sizeof(u_int32_t)-1)) {
   static u_char *abuf = NULL;
   static int didwarn = 0;
   int clen = snapend - bp;
   if (clen  snaplen)
   clen = snaplen;



Re: operations on nd_prefix list must take rdomain into account

2014-12-08 Thread Mike Belopuhov
Ping.

On Fri, Nov 28, 2014 at 13:40 +0100, Mike Belopuhov wrote:
 Still looking for OK's.
 
 On Wed, Nov 26, 2014 at 18:24 +0100, Mike Belopuhov wrote:
  More rdomain checks are needed to be able to use the same subnet
  in a back to back connection between IPv6 rdomains as pointed out
  by mpi@.
  
  OK?
  
  diff --git sys/netinet6/nd6.c sys/netinet6/nd6.c
  index 9616187..d704cd6 100644
  --- sys/netinet6/nd6.c
  +++ sys/netinet6/nd6.c
  @@ -1264,10 +1264,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet 
  *ifp)
  s = splsoftnet();
  /* First purge the addresses referenced by a prefix. */
  LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) {
  struct in6_ifaddr *ia6, *ia6_next;
   
  +   if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
  +   continue;
  +
  if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr))
  continue; /* XXX */
   
  /* do we really have to remove addresses as well? */
  TAILQ_FOREACH_SAFE(ia6, in6_ifaddr, ia_list, ia6_next) 
  {
  @@ -1282,10 +1285,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet 
  *ifp)
   * Purging the addresses might remove the prefix as well.
   * So run the loop again to access only prefixes that have
   * not been freed already.
   */
  LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) {
  +   if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
  +   continue;
  +
  if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr))
  continue; /* XXX */
   
  prelist_remove(pr);
  }
  diff --git sys/netinet6/nd6_rtr.c sys/netinet6/nd6_rtr.c
  index bfc9c9f..e46b3b4 100644
  --- sys/netinet6/nd6_rtr.c
  +++ sys/netinet6/nd6_rtr.c
  @@ -1690,10 +1690,13 @@ nd6_prefix_onlink(struct nd_prefix *pr)
   * interface, and the prefix has already installed the interface route.
   * Although such a configuration is expected to be rare, we explicitly
   * allow it.
   */
  LIST_FOREACH(opr, nd_prefix, ndpr_entry) {
  +   if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
  +   continue;
  +
  if (opr == pr)
  continue;
   
  if ((opr-ndpr_stateflags  NDPRF_ONLINK) == 0)
  continue;
  @@ -1826,10 +1829,13 @@ nd6_prefix_offlink(struct nd_prefix *pr)
   * the interface route (see comments in nd6_prefix_onlink).
   * If there's one, try to make the prefix on-link on the
   * interface.
   */
  LIST_FOREACH(opr, nd_prefix, ndpr_entry) {
  +   if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
  +   continue;
  +
  if (opr == pr)
  continue;
   
  if ((opr-ndpr_stateflags  NDPRF_ONLINK) != 0)
  continue;



Re: ix(4) and Unsupported SFP+ Module

2014-12-05 Thread Mike Belopuhov
On Fri, Dec 05, 2014 at 11:24 +0100, Gabriel Linder wrote:
 Hi,
 
 On a 5.6-release I have an ix card which refuses to work with unsupported 
 SFP+ modules, saying this : ix0 at pci1 dev 0 function 0 Intel 82599 rev 
 0x01Unsupported SFP+ Module.
 
 However, this seems to be an artificial limitation of the ix driver : the 
 following diff (against sys.tar.gz as of 5.6-release) allows my SFP+ and it 
 works fine so far.
 
 The patch was made in a hurry, so I probably have cut more than I should... 
 Is there a simpler/better way to fix this ?
 
 

I think this is roughly what I have recently committed to -current.
But hey, thanks for taking your time for looking into this!



tcpdump follows incorrect alignment requirement rules

2014-12-03 Thread Mike Belopuhov
bpf aligns data following the datalink header (e.g. ethernet)
on the BPF_ALIGNMENT boundary.  Since rev1.41 of bpf.h it's
uint32_t instead of a long.  And also since then almost all
packets become unaligned from the tcpdump perspective and
require costly copies into the internal buffer.  Neither IP
header (struct ip) nor IPv6 (struct ip6_hdr) have fields
larger than 32 bits and therefore alignment requirements for
them are at most 32 bit.

I've successfully tested this diff on sparc64 and amd64.

OK?

diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c
index 1839869..60946a1 100644
--- usr.sbin/tcpdump/print-ip.c
+++ usr.sbin/tcpdump/print-ip.c
@@ -368,11 +368,11 @@ ip_print(register const u_char *bp, register u_int length)
/*
 * If the IP header is not aligned, copy into abuf.
 * This will never happen with BPF.  It does happen with raw packet
 * dumps from -r.
 */
-   if ((intptr_t)ip  (sizeof(long)-1)) {
+   if ((intptr_t)ip  (sizeof(u_int32_t)-1)) {
static u_char *abuf = NULL;
static int didwarn = 0;
int clen = snapend - bp;
 
if (clen  snaplen)
diff --git usr.sbin/tcpdump/print-ip6.c usr.sbin/tcpdump/print-ip6.c
index cca8495..ea3a9b3 100644
--- usr.sbin/tcpdump/print-ip6.c
+++ usr.sbin/tcpdump/print-ip6.c
@@ -70,11 +70,11 @@ ip6_print(register const u_char *bp, register u_int length)
/*
 * The IP header is not word aligned, so copy into abuf.
 * This will never happen with BPF.  It does happen with
 * raw packet dumps from -r.
 */
-   if ((intptr_t)ip6  (sizeof(long)-1)) {
+   if ((intptr_t)ip6  (sizeof(u_int32_t)-1)) {
static u_char *abuf = NULL;
static int didwarn = 0;
int clen = snapend - bp;
if (clen  snaplen)
clen = snaplen;



Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned

2014-11-28 Thread Mike Belopuhov
Still looking for OK's.

On Wed, Nov 26, 2014 at 17:18 +0100, Mike Belopuhov wrote:
  
  better diff.  the problem is that dissectors use packetp and
  snapend pointers themselves therefore they should be pointing
  to the newly allocated structure.  we can restore them once
  we're done with the inner content and go back to the caller
  to see if we need to hexdump the contents.
  
  i'll see if i can cook and test the ipv6 version.
  
  OK?
  
 
 now with an ip6 version and i've made sure that this fixes
 dumping unaligned ipv6 packets as well.  in the meantime
 jsg@ has lured me into looking at the afl crash in the same
 code and it looks like the check from ip6_print is useful
 here: if we haven't got enough data for a header, don't
 bother with anything else and just bail.
 
 ok?
 
 diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c
 index 3f4194c..e9d2185 100644
 --- usr.sbin/tcpdump/print-ip.c
 +++ usr.sbin/tcpdump/print-ip.c
 @@ -351,22 +351,27 @@ in_cksum(const u_short *addr, register int len, int 
 csum)
   * print an IP datagram.
   */
  void
  ip_print(register const u_char *bp, register u_int length)
  {
 + static u_char *abuf = NULL;
   register const struct ip *ip;
   register u_int hlen, len, off;
   register const u_char *cp;
 + const u_char *pktp = packetp;
 + const u_char *send = snapend;
  
   ip = (const struct ip *)bp;
 + if ((u_char *)(ip + 1)  snapend) {
 + printf([|ip]);
 + return;
 + }
 +
   /*
* If the IP header is not aligned, copy into abuf.
 -  * This will never happen with BPF.  It does happen with raw packet
 -  * dumps from -r.
*/
   if ((intptr_t)ip  (sizeof(long)-1)) {
 - static u_char *abuf = NULL;
   static int didwarn = 0;
   int clen = snapend - bp;
  
   if (clen  snaplen)
   clen = snaplen;
 @@ -387,11 +392,11 @@ ip_print(register const u_char *bp, register u_int 
 length)
   }
  
   TCHECK(*ip);
   if (ip-ip_v != IPVERSION) {
   (void)printf(bad-ip-version %u, ip-ip_v);
 - return;
 + goto out;
   }
  
   len = ntohs(ip-ip_len);
   if (length  len) {
   (void)printf(truncated-ip - %d bytes missing!,
 @@ -400,11 +405,11 @@ ip_print(register const u_char *bp, register u_int 
 length)
   }
  
   hlen = ip-ip_hl * 4;
   if (hlen  sizeof(struct ip) || hlen  len) {
   (void)printf(bad-hlen %d, hlen);
 - return;
 + goto out;
   }
  
   len -= hlen;
  
   /*
 @@ -465,11 +470,11 @@ ip_print(register const u_char *bp, register u_int 
 length)
ipaddr_string(ip-ip_src),
ipaddr_string(ip-ip_dst));
   ip_print(cp, len);
   if (! vflag) {
   printf( (encap));
 - return;
 + goto out;
   }
   break;
  
  #ifdef INET6
  #ifndef IPPROTO_IPV6
 @@ -482,11 +487,11 @@ ip_print(register const u_char *bp, register u_int 
 length)
ipaddr_string(ip-ip_src),
ipaddr_string(ip-ip_dst));
   ip6_print(cp, len);
   if (! vflag) {
   printf( (encap));
 - return;
 + goto out;
   }
   break;
  #endif /*INET6*/
  
  #ifndef IPPROTO_GRE
 @@ -499,11 +504,11 @@ ip_print(register const u_char *bp, register u_int 
 length)
ipaddr_string(ip-ip_dst));
   /* do it */
   gre_print(cp, len);
   if (! vflag) {
   printf( (gre encap));
 - return;
 + goto out;
   }
   break;
  
  #ifndef IPPROTO_ESP
  #define IPPROTO_ESP 50
 @@ -528,11 +533,11 @@ ip_print(register const u_char *bp, register u_int 
 length)
ipaddr_string(ip-ip_src),
ipaddr_string(ip-ip_dst));
   mobile_print(cp, len);
   if (! vflag) {
   printf( (mobile encap));
 - return;
 + goto out;
   }
   break;
  
  #ifndef IPPROTO_ETHERIP
  #define IPPROTO_ETHERIP  97
 @@ -653,10 +658,13 @@ ip_print(register const u_char *bp, register u_int 
 length)
   (void)printf(%soptlen=%d, sep, hlen);
   ip_optprint((u_char *)(ip + 1), hlen

Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned

2014-11-27 Thread Mike Belopuhov
On 27 November 2014 at 03:12, Theo de Raadt dera...@cvs.openbsd.org wrote:
 On Tue, Nov 25, 2014 at 18:42 +0100, Mike Belopuhov wrote:
  On Mon, Nov 24, 2014 at 19:04 +0100, Mike Belopuhov wrote:
   Hi,
  
   IP header is not always aligned since bpf copies out the mbuf
   chain into the contigous buffer provided by the userland.  I've
   seen this with large packet sizes on VLANs.  ip_print will then
   copy the packet but the Ethernet header into the internal buffer
   so that it can cast it to the IP header structure and update
   global packetp and snapend pointers hence preventing the -Xx
   dumping code from printing out the Ethernet header itself.
  
   Diff below fixes it.  OK?
  
 
  better diff.  the problem is that dissectors use packetp and
  snapend pointers themselves therefore they should be pointing
  to the newly allocated structure.  we can restore them once
  we're done with the inner content and go back to the caller
  to see if we need to hexdump the contents.
 
  i'll see if i can cook and test the ipv6 version.
 
  OK?
 

 now with an ip6 version and i've made sure that this fixes
 dumping unaligned ipv6 packets as well.  in the meantime
 jsg@ has lured me into looking at the afl crash in the same
 code and it looks like the check from ip6_print is useful
 here: if we haven't got enough data for a header, don't
 bother with anything else and just bail.

 ok?


 Did you test on a strict alignment machine?

works fine on sparc64.



Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned

2014-11-26 Thread Mike Belopuhov
On Tue, Nov 25, 2014 at 18:42 +0100, Mike Belopuhov wrote:
 On Mon, Nov 24, 2014 at 19:04 +0100, Mike Belopuhov wrote:
  Hi,
  
  IP header is not always aligned since bpf copies out the mbuf
  chain into the contigous buffer provided by the userland.  I've
  seen this with large packet sizes on VLANs.  ip_print will then
  copy the packet but the Ethernet header into the internal buffer
  so that it can cast it to the IP header structure and update
  global packetp and snapend pointers hence preventing the -Xx
  dumping code from printing out the Ethernet header itself.
  
  Diff below fixes it.  OK?
  
 
 better diff.  the problem is that dissectors use packetp and
 snapend pointers themselves therefore they should be pointing
 to the newly allocated structure.  we can restore them once
 we're done with the inner content and go back to the caller
 to see if we need to hexdump the contents.
 
 i'll see if i can cook and test the ipv6 version.
 
 OK?
 

now with an ip6 version and i've made sure that this fixes
dumping unaligned ipv6 packets as well.  in the meantime
jsg@ has lured me into looking at the afl crash in the same
code and it looks like the check from ip6_print is useful
here: if we haven't got enough data for a header, don't
bother with anything else and just bail.

ok?

diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c
index 3f4194c..e9d2185 100644
--- usr.sbin/tcpdump/print-ip.c
+++ usr.sbin/tcpdump/print-ip.c
@@ -351,22 +351,27 @@ in_cksum(const u_short *addr, register int len, int csum)
  * print an IP datagram.
  */
 void
 ip_print(register const u_char *bp, register u_int length)
 {
+   static u_char *abuf = NULL;
register const struct ip *ip;
register u_int hlen, len, off;
register const u_char *cp;
+   const u_char *pktp = packetp;
+   const u_char *send = snapend;
 
ip = (const struct ip *)bp;
+   if ((u_char *)(ip + 1)  snapend) {
+   printf([|ip]);
+   return;
+   }
+
/*
 * If the IP header is not aligned, copy into abuf.
-* This will never happen with BPF.  It does happen with raw packet
-* dumps from -r.
 */
if ((intptr_t)ip  (sizeof(long)-1)) {
-   static u_char *abuf = NULL;
static int didwarn = 0;
int clen = snapend - bp;
 
if (clen  snaplen)
clen = snaplen;
@@ -387,11 +392,11 @@ ip_print(register const u_char *bp, register u_int length)
}
 
TCHECK(*ip);
if (ip-ip_v != IPVERSION) {
(void)printf(bad-ip-version %u, ip-ip_v);
-   return;
+   goto out;
}
 
len = ntohs(ip-ip_len);
if (length  len) {
(void)printf(truncated-ip - %d bytes missing!,
@@ -400,11 +405,11 @@ ip_print(register const u_char *bp, register u_int length)
}
 
hlen = ip-ip_hl * 4;
if (hlen  sizeof(struct ip) || hlen  len) {
(void)printf(bad-hlen %d, hlen);
-   return;
+   goto out;
}
 
len -= hlen;
 
/*
@@ -465,11 +470,11 @@ ip_print(register const u_char *bp, register u_int length)
 ipaddr_string(ip-ip_src),
 ipaddr_string(ip-ip_dst));
ip_print(cp, len);
if (! vflag) {
printf( (encap));
-   return;
+   goto out;
}
break;
 
 #ifdef INET6
 #ifndef IPPROTO_IPV6
@@ -482,11 +487,11 @@ ip_print(register const u_char *bp, register u_int length)
 ipaddr_string(ip-ip_src),
 ipaddr_string(ip-ip_dst));
ip6_print(cp, len);
if (! vflag) {
printf( (encap));
-   return;
+   goto out;
}
break;
 #endif /*INET6*/
 
 #ifndef IPPROTO_GRE
@@ -499,11 +504,11 @@ ip_print(register const u_char *bp, register u_int length)
 ipaddr_string(ip-ip_dst));
/* do it */
gre_print(cp, len);
if (! vflag) {
printf( (gre encap));
-   return;
+   goto out;
}
break;
 
 #ifndef IPPROTO_ESP
 #define IPPROTO_ESP 50
@@ -528,11 +533,11 @@ ip_print(register const u_char *bp, register u_int length)
 ipaddr_string(ip-ip_src),
 ipaddr_string(ip-ip_dst

operations on nd_prefix list must take rdomain into account

2014-11-26 Thread Mike Belopuhov
More rdomain checks are needed to be able to use the same subnet
in a back to back connection between IPv6 rdomains as pointed out
by mpi@.

OK?

diff --git sys/netinet6/nd6.c sys/netinet6/nd6.c
index 9616187..d704cd6 100644
--- sys/netinet6/nd6.c
+++ sys/netinet6/nd6.c
@@ -1264,10 +1264,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp)
s = splsoftnet();
/* First purge the addresses referenced by a prefix. */
LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) {
struct in6_ifaddr *ia6, *ia6_next;
 
+   if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
+   continue;
+
if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr))
continue; /* XXX */
 
/* do we really have to remove addresses as well? */
TAILQ_FOREACH_SAFE(ia6, in6_ifaddr, ia_list, ia6_next) 
{
@@ -1282,10 +1285,13 @@ nd6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp)
 * Purging the addresses might remove the prefix as well.
 * So run the loop again to access only prefixes that have
 * not been freed already.
 */
LIST_FOREACH_SAFE(pr, nd_prefix, ndpr_entry, npr) {
+   if (pr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
+   continue;
+
if (IN6_IS_ADDR_LINKLOCAL(pr-ndpr_prefix.sin6_addr))
continue; /* XXX */
 
prelist_remove(pr);
}
diff --git sys/netinet6/nd6_rtr.c sys/netinet6/nd6_rtr.c
index bfc9c9f..e46b3b4 100644
--- sys/netinet6/nd6_rtr.c
+++ sys/netinet6/nd6_rtr.c
@@ -1690,10 +1690,13 @@ nd6_prefix_onlink(struct nd_prefix *pr)
 * interface, and the prefix has already installed the interface route.
 * Although such a configuration is expected to be rare, we explicitly
 * allow it.
 */
LIST_FOREACH(opr, nd_prefix, ndpr_entry) {
+   if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
+   continue;
+
if (opr == pr)
continue;
 
if ((opr-ndpr_stateflags  NDPRF_ONLINK) == 0)
continue;
@@ -1826,10 +1829,13 @@ nd6_prefix_offlink(struct nd_prefix *pr)
 * the interface route (see comments in nd6_prefix_onlink).
 * If there's one, try to make the prefix on-link on the
 * interface.
 */
LIST_FOREACH(opr, nd_prefix, ndpr_entry) {
+   if (opr-ndpr_ifp-if_rdomain != ifp-if_rdomain)
+   continue;
+
if (opr == pr)
continue;
 
if ((opr-ndpr_stateflags  NDPRF_ONLINK) != 0)
continue;



Re: iked control process crash at startup

2014-11-25 Thread Mike Belopuhov
On 25 November 2014 at 13:11, Vincent Gross dermi...@kilob.yt wrote:
 Hi tech@,

 I've been using iked for some weeks to tunnel my laptop to home over 3G.
 Sunday I upgraded my laptop to the latest snapshot; previous upgrade was
 about 2 or 3 weeks ago. When I started iked, it crashed randomly, as in
 one time it runs just fine and completes the handshake, the other it
 crashes before even sending the first packet.

 I ran ktrace -di /sbin/iked and kdump'd the resulting file. Of the 5
 processes, 4 finished by calling exit(0), one was terminated on a
 SIGSEGV. As it is also the only one that do stuff on /var/run/iked.sock,
 it is the control process. I repeated the above ktrace 4 times and got
 consistent results: SIGSEGV'd control process.

 I'll keep the hunt going, but I am not sure how long this will take nor
 how much time I'll have to spare, so here is the control process kdump.

 Cheers,

 --
 Vincent



can you please provide iked -dvv output when it crashes and a trace
from the core file.  to generate a core file compile iked like so:

  # cd /usr/src/sbin/iked
  # make clean
  # make obj
  # make DEBUG=-g

setup core file generation:

  # mkdir /var/crash/iked
  # chmod 700 /var/crash/iked
  # sysctl kern.nosuidcoredump=3

run /usr/src/sbin/iked/obj/iked -dvv and once it crashes retrieve
the full backtrace from the core file:

  # cd /usr/src/sbin/iked
  # gdb -c /var/crash/iked/4970.core ./obj/iked
  (gdb) bt full

and send this along with the iked output.  please make sure that
you pick the correct core file so that it corresponds to the debug
output.



Re: Simplify in_broadcast()

2014-11-25 Thread Mike Belopuhov
On 20 November 2014 at 15:24, Martin Pieuchot mpieuc...@nolizard.org wrote:
 Diff below make the function always iterate on all the interfaces.

 After that I'd like to change ifa_ifwithaddr() to only match unicast
 addresses and use in_broadcast() in the few places where we also accept
 broadcast addresses.
 This would prevent people from matching a broadcast address when they
 don't want to.

 Ok?


OK



Re: Rename rt_ifa_addloop(9)

2014-11-24 Thread Mike Belopuhov
On 20 November 2014 at 15:37, Martin Pieuchot mpieuc...@nolizard.org wrote:
 When I decided to use in6_ifaddloop() for IPv4 I barely though about
 the name of the function.  Recently mikeb@ told me that the name is
 confusing, especially because I'm trying to turn the loopback hack
 into local routes.

 So here's a diff to rename these functions and to make them return
 the error code of rtrequest9(9), ignored for the moment.

 Ok?


looks good to me.



Re: Make every interface with a watchdog register it's own timeout

2014-11-24 Thread Mike Belopuhov
On Sun, Nov 23, 2014 at 13:39 +0100, Claudio Jeker wrote:
 On Sun, Nov 23, 2014 at 02:10:24AM +0100, Mike Belopuhov wrote:
  Hi,
  
  This removes the system wide if_slowtimo timeout and lets every
  interface with a valid if_watchdog method register it's own.
  The rational is to get rid of the ifnet loop in the softclock
  context to avoid further complications with concurrent access
  to the ifnet list.  This might also save some cpu cycles in the
  runtime if number of interfaces is large while only a fraction
  all of them have a watchdog routine.
  
  This has originally come up on NetBSD mailing lists with some
  rather complicated locking strategy.  Masao Uebayashi has later
  proposed a similar solution.
  
  The diff was looked at and corrected by mpi@.  I've tested the
  trunk and a system build as well.  OK?
  
 
 Please do not use M_TEMP for this (this is not a temporary buffer).

There's nothing wrong with M_TEMP usage here.  In fact all the other
allocations are done with M_TEMP in if.c.

 Maybe even remove the need to allocate the struct timeout (which requires
 if_var.h to include sys/timeout.h and adds 24 / 40 bytes to ifnet even
 when there is no watchdog used (like on vlan)).
 

That would be nice, however, I don't think right now it matters too
much.  Some userland apps are still using kvm to get their hands on
the ifnet structure and I'm not 100% positive we should expose
sys/timeout.h via if_var.h or teach them to include it on it's own.

Historically it seems that if.h was overused in userland hence
there are plenty of other things (list heads for example) that get
allocated while they clearly should be part of the ifnet.  A
thorough clean up is long due.

 In any case I think M_DEVBUF is the right type.
 
 We could also rename if_slowtimo to something more concrete like
 if_watchdog_timeout but that's just a bikeshed.
 

We can, but as you said it's bikeshedding.  I'm not saving the
world here (c) Theo.



Re: Make every interface with a watchdog register it's own timeout

2014-11-24 Thread Mike Belopuhov
On Sun, Nov 23, 2014 at 12:06 +0100, Martin Pieuchot wrote:
 On 23/11/14(Sun) 02:10, Mike Belopuhov wrote:
  Hi,
  
  This removes the system wide if_slowtimo timeout and lets every
  interface with a valid if_watchdog method register it's own.
  The rational is to get rid of the ifnet loop in the softclock
  context to avoid further complications with concurrent access
  to the ifnet list.  This might also save some cpu cycles in the
  runtime if number of interfaces is large while only a fraction
  all of them have a watchdog routine.
  
  This has originally come up on NetBSD mailing lists with some
  rather complicated locking strategy.  Masao Uebayashi has later
  proposed a similar solution.
  
  The diff was looked at and corrected by mpi@.  I've tested the
  trunk and a system build as well.  OK?
 
 I'm basically ok, some remarks below.
 
 
 What about putting the allocation inside if_attach_common() with the
 hooks that are allocated for the same reason?

Sure.

 Maybe we should explain why we allocate them?
 

We all know why, don't think it will help anything.

New version.  OK?


diff --git sys/net/if.c sys/net/if.c
index 0632eec..679c8c8 100644
--- sys/net/if.c
+++ sys/net/if.c
@@ -133,11 +133,10 @@ void  if_attachdomain1(struct ifnet *);
 void   if_attach_common(struct ifnet *);
 
 void   if_detach_queues(struct ifnet *, struct ifqueue *);
 void   if_detached_start(struct ifnet *);
 intif_detached_ioctl(struct ifnet *, u_long, caddr_t);
-void   if_detached_watchdog(struct ifnet *);
 
 intif_getgroup(caddr_t, struct ifnet *);
 intif_getgroupmembers(caddr_t);
 intif_getgroupattribs(caddr_t);
 intif_setgroupattribs(caddr_t);
@@ -169,16 +168,12 @@ int   net_livelocked(void);
  * parameters.
  */
 void
 ifinit()
 {
-   static struct timeout if_slowtim;
-
-   timeout_set(if_slowtim, if_slowtimo, if_slowtim);
timeout_set(net_tick_to, net_tick, net_tick_to);
 
-   if_slowtimo(if_slowtim);
net_tick(net_tick_to);
 }
 
 static unsigned int if_index = 0;
 static unsigned int if_indexlim = 0;
@@ -270,10 +265,13 @@ if_attachsetup(struct ifnet *ifp)
if_attachdomain1(ifp);
 #if NPF  0
pfi_attach_ifnet(ifp);
 #endif
 
+   timeout_set(ifp-if_slowtimo, if_slowtimo, ifp);
+   if_slowtimo(ifp);
+
/* Announce the interface. */
rt_ifannouncemsg(ifp, IFAN_ARRIVAL);
 }
 
 /*
@@ -426,10 +424,13 @@ if_attach_common(struct ifnet *ifp)
M_TEMP, M_WAITOK);
TAILQ_INIT(ifp-if_linkstatehooks);
ifp-if_detachhooks = malloc(sizeof(*ifp-if_detachhooks),
M_TEMP, M_WAITOK);
TAILQ_INIT(ifp-if_detachhooks);
+
+   ifp-if_slowtimo = malloc(sizeof(*ifp-if_slowtimo), M_TEMP,
+   M_WAITOK|M_ZERO);
 }
 
 void
 if_start(struct ifnet *ifp)
 {
@@ -481,19 +482,22 @@ if_detach(struct ifnet *ifp)
struct domain *dp;
 
ifp-if_flags = ~IFF_OACTIVE;
ifp-if_start = if_detached_start;
ifp-if_ioctl = if_detached_ioctl;
-   ifp-if_watchdog = if_detached_watchdog;
+   ifp-if_watchdog = NULL;
 
/*
 * Call detach hooks from head to tail.  To make sure detach
 * hooks are executed in the reverse order they were added, all
 * the hooks have to be added to the head!
 */
dohooks(ifp-if_detachhooks, HOOK_REMOVE | HOOK_FREE);
 
+   /* Remove the watchdog timeout */
+   timeout_del(ifp-if_slowtimo);
+
 #if NBRIDGE  0
/* Remove the interface from any bridge it is part of.  */
if (ifp-if_bridgeport)
bridge_ifdetach(ifp);
 #endif
@@ -577,10 +581,12 @@ do { \
 
free(ifp-if_addrhooks, M_TEMP, 0);
free(ifp-if_linkstatehooks, M_TEMP, 0);
free(ifp-if_detachhooks, M_TEMP, 0);
 
+   free(ifp-if_slowtimo, M_TEMP, sizeof(*ifp-if_slowtimo));
+
for (dp = domains; dp; dp = dp-dom_next) {
if (dp-dom_ifdetach  ifp-if_afdata[dp-dom_family])
(*dp-dom_ifdetach)(ifp,
ifp-if_afdata[dp-dom_family]);
}
@@ -1160,29 +1166,26 @@ if_link_state_change_task(void *arg, void *unused)
}
splx(s);
 }
 
 /*
- * Handle interface watchdog timer routines.  Called
- * from softclock, we decrement timers (if set) and
+ * Handle interface watchdog timer routine.  Called
+ * from softclock, we decrement timer (if set) and
  * call the appropriate interface routine on expiration.
  */
 void
 if_slowtimo(void *arg)
 {
-   struct timeout *to = (struct timeout *)arg;
-   struct ifnet *ifp;
+   struct ifnet *ifp = arg;
int s = splnet();
 
-   TAILQ_FOREACH(ifp, ifnet, if_list) {
-   if (ifp-if_timer == 0 || --ifp-if_timer)
-   continue;
-   if (ifp-if_watchdog)
+   if (ifp-if_watchdog) {
+   if (ifp-if_timer  0  --ifp-if_timer == 0)
(*ifp-if_watchdog)(ifp);
+   timeout_add(ifp

Trimming tcpdump a bit

2014-11-24 Thread Mike Belopuhov
Hi,

I've been trying to fix a bug in tcpdump but the rottenness
of the current code base with it's horrendous APIs is just
getting in the way.  What if we trimmed it a bit, say killed
all those pesky 'register' values, kill protocols that we
cannot really test (appletalk, fddi, etc.), kill dissectors
that are barely implemented (krb for instance).  Would anyone
be against this?  Does anyone have any plans to sync it up
with the upstream?

Here's a 91K diff to deregister tcpdump.
http://www.vantronix.net/~mike/tcpdump-deregister.diff

OK?



Re: Trimming tcpdump a bit

2014-11-24 Thread Mike Belopuhov
On 24 November 2014 at 17:20, Mike Belopuhov m...@belopuhov.com wrote:
 On 24 November 2014 at 16:42, Mike Belopuhov m...@belopuhov.com wrote:
 Hi,

 I've been trying to fix a bug in tcpdump but the rottenness
 of the current code base with it's horrendous APIs is just
 getting in the way.  What if we trimmed it a bit, say killed
 all those pesky 'register' values, kill protocols that we
 cannot really test (appletalk, fddi, etc.), kill dissectors
 that are barely implemented (krb for instance).  Would anyone
 be against this?  Does anyone have any plans to sync it up
 with the upstream?

 Here's a 91K diff to deregister tcpdump.
 http://www.vantronix.net/~mike/tcpdump-deregister.diff

 OK?

 http://www.vantronix.net/~mike/tcpdump-atm.diff
 http://www.vantronix.net/~mike/tcpdump-fddi.diff
 http://www.vantronix.net/~mike/tcpdump-slip.diff

 SLIP diff has an added benefit that /usr/include/net/slip.h
 will no longer be needed.

AppleTalk?
http://www.vantronix.net/~mike/tcpdump-atalk.diff

DECnet?
http://www.vantronix.net/~mike/tcpdump-decnet.diff



Re: Trimming tcpdump a bit

2014-11-24 Thread Mike Belopuhov
On 24 November 2014 at 16:42, Mike Belopuhov m...@belopuhov.com wrote:
 Hi,

 I've been trying to fix a bug in tcpdump but the rottenness
 of the current code base with it's horrendous APIs is just
 getting in the way.  What if we trimmed it a bit, say killed
 all those pesky 'register' values, kill protocols that we
 cannot really test (appletalk, fddi, etc.), kill dissectors
 that are barely implemented (krb for instance).  Would anyone
 be against this?  Does anyone have any plans to sync it up
 with the upstream?

 Here's a 91K diff to deregister tcpdump.
 http://www.vantronix.net/~mike/tcpdump-deregister.diff

 OK?

Apparently the answer is no.  I'm withdrawing all diffs.



tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned

2014-11-24 Thread Mike Belopuhov
Hi,

IP header is not always aligned since bpf copies out the mbuf
chain into the contigous buffer provided by the userland.  I've
seen this with large packet sizes on VLANs.  ip_print will then
copy the packet but the Ethernet header into the internal buffer
so that it can cast it to the IP header structure and update
global packetp and snapend pointers hence preventing the -Xx
dumping code from printing out the Ethernet header itself.

Diff below fixes it.  OK?

diff --git usr.sbin/tcpdump/print-ip.c usr.sbin/tcpdump/print-ip.c
index 3f4194c..6d90b3d 100644
--- usr.sbin/tcpdump/print-ip.c
+++ usr.sbin/tcpdump/print-ip.c
@@ -252,7 +252,7 @@ ip_printroute(const char *type, register const u_char *cp, 
u_int length)
  * print IP options.
  */
 static void
-ip_optprint(register const u_char *cp, u_int length)
+ip_optprint(register const u_char *cp, u_char *send, u_int length)
 {
register u_int len;
int tt;
@@ -265,7 +265,7 @@ ip_optprint(register const u_char *cp, u_int length)
printf([|ip op len %d], len);
return;
}
-   if (cp[1] = snapend || cp + len  snapend) {
+   if (cp[1] = send || cp + len  send) {
printf([|ip]);
return;
}
@@ -356,12 +356,11 @@ ip_print(register const u_char *bp, register u_int length)
register const struct ip *ip;
register u_int hlen, len, off;
register const u_char *cp;
+   u_char *send = snapend;
 
ip = (const struct ip *)bp;
/*
 * If the IP header is not aligned, copy into abuf.
-* This will never happen with BPF.  It does happen with raw packet
-* dumps from -r.
 */
if ((intptr_t)ip  (sizeof(long)-1)) {
static u_char *abuf = NULL;
@@ -376,8 +375,7 @@ ip_print(register const u_char *bp, register u_int length)
error(ip_print: malloc);
}
memmove((char *)abuf, (char *)ip, min(length, clen));
-   snapend = abuf + clen;
-   packetp = abuf;
+   send = abuf + clen;
ip = (struct ip *)abuf;
/* We really want libpcap to give us aligned packets */
if (!didwarn) {
@@ -538,7 +536,7 @@ ip_print(register const u_char *bp, register u_int length)
 #define IPPROTO_ETHERIP97
 #endif
case IPPROTO_ETHERIP:
-   etherip_print(cp, snapend - cp, len,
+   etherip_print(cp, send - cp, len,
(const u_char *)ip);
break;
 
@@ -573,7 +571,7 @@ ip_print(register const u_char *bp, register u_int length)
 #endif
case IPPROTO_PFSYNC:
pfsync_ip_print(cp,
-   (int)(snapend - (u_char *)ip) - hlen,
+   (int)(send - (u_char *)ip) - hlen,
(const u_char *)ip);
break;
 
@@ -638,7 +636,7 @@ ip_print(register const u_char *bp, register u_int length)
}
(void)printf(%slen %u, sep, ntohs(ip-ip_len));
sep = , ;
-   if ((u_char *)ip + hlen = snapend) {
+   if ((u_char *)ip + hlen = send) {
u_int16_t sum, ip_sum;
sum = in_cksum((const u_short *)ip, hlen, 0);
if (sum != 0) {
@@ -651,7 +649,7 @@ ip_print(register const u_char *bp, register u_int length)
if (hlen  sizeof(struct ip)) {
hlen -= sizeof(struct ip);
(void)printf(%soptlen=%d, sep, hlen);
-   ip_optprint((u_char *)(ip + 1), hlen);
+   ip_optprint((u_char *)(ip + 1), send, hlen);
}
printf());
}



Re: tcpdump: Ethernet header is not dumped with -xX if IP header is unaligned

2014-11-24 Thread Mike Belopuhov
On Nov 24, 2014 7:10 PM, Mike Belopuhov m...@belopuhov.com wrote:

 Hi,

 IP header is not always aligned since bpf copies out the mbuf
 chain into the contigous buffer provided by the userland.  I've
 seen this with large packet sizes on VLANs.  ip_print will then
 copy the packet but the Ethernet header into the internal buffer
 so that it can cast it to the IP header structure and update
 global packetp and snapend pointers hence preventing the -Xx
 dumping code from printing out the Ethernet header itself.

 Diff below fixes it.  OK?


Err, I think I've sent the wrong diff.  Please ignore it.  I'll resend
it tomorrow.


Make every interface with a watchdog register it's own timeout

2014-11-22 Thread Mike Belopuhov
Hi,

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own.
The rational is to get rid of the ifnet loop in the softclock
context to avoid further complications with concurrent access
to the ifnet list.  This might also save some cpu cycles in the
runtime if number of interfaces is large while only a fraction
all of them have a watchdog routine.

This has originally come up on NetBSD mailing lists with some
rather complicated locking strategy.  Masao Uebayashi has later
proposed a similar solution.

The diff was looked at and corrected by mpi@.  I've tested the
trunk and a system build as well.  OK?

Index: sys/net/if.c
===
RCS file: /home/cvs/src/sys/net/if.c,v
retrieving revision 1.303
diff -u -p -r1.303 if.c
--- sys/net/if.c3 Nov 2014 11:02:08 -   1.303
+++ sys/net/if.c22 Nov 2014 16:25:34 -
@@ -135,7 +135,6 @@ voidif_attach_common(struct ifnet *);
 void   if_detach_queues(struct ifnet *, struct ifqueue *);
 void   if_detached_start(struct ifnet *);
 intif_detached_ioctl(struct ifnet *, u_long, caddr_t);
-void   if_detached_watchdog(struct ifnet *);
 
 intif_getgroup(caddr_t, struct ifnet *);
 intif_getgroupmembers(caddr_t);
@@ -171,12 +170,8 @@ intnet_livelocked(void);
 void
 ifinit()
 {
-   static struct timeout if_slowtim;
-
-   timeout_set(if_slowtim, if_slowtimo, if_slowtim);
timeout_set(net_tick_to, net_tick, net_tick_to);
 
-   if_slowtimo(if_slowtim);
net_tick(net_tick_to);
 }
 
@@ -272,6 +267,11 @@ if_attachsetup(struct ifnet *ifp)
pfi_attach_ifnet(ifp);
 #endif
 
+   ifp-if_wdtimo = malloc(sizeof(*ifp-if_wdtimo), M_TEMP,
+   M_WAITOK|M_ZERO);
+   timeout_set(ifp-if_wdtimo, if_slowtimo, ifp);
+   if_slowtimo(ifp);
+
/* Announce the interface. */
rt_ifannouncemsg(ifp, IFAN_ARRIVAL);
 }
@@ -483,7 +483,7 @@ if_detach(struct ifnet *ifp)
ifp-if_flags = ~IFF_OACTIVE;
ifp-if_start = if_detached_start;
ifp-if_ioctl = if_detached_ioctl;
-   ifp-if_watchdog = if_detached_watchdog;
+   ifp-if_watchdog = NULL;
 
/*
 * Call detach hooks from head to tail.  To make sure detach
@@ -492,6 +492,10 @@ if_detach(struct ifnet *ifp)
 */
dohooks(ifp-if_detachhooks, HOOK_REMOVE | HOOK_FREE);
 
+   /* Remove the watchdog timeout */
+   timeout_del(ifp-if_wdtimo);
+   free(ifp-if_wdtimo, M_TEMP, sizeof(*ifp-if_wdtimo));
+
 #if NBRIDGE  0
/* Remove the interface from any bridge it is part of.  */
if (ifp-if_bridgeport)
@@ -1162,25 +1166,22 @@ if_link_state_change_task(void *arg, voi
 }
 
 /*
- * Handle interface watchdog timer routines.  Called
- * from softclock, we decrement timers (if set) and
+ * Handle interface watchdog timer routine.  Called
+ * from softclock, we decrement timer (if set) and
  * call the appropriate interface routine on expiration.
  */
 void
 if_slowtimo(void *arg)
 {
-   struct timeout *to = (struct timeout *)arg;
-   struct ifnet *ifp;
+   struct ifnet *ifp = arg;
int s = splnet();
 
-   TAILQ_FOREACH(ifp, ifnet, if_list) {
-   if (ifp-if_timer == 0 || --ifp-if_timer)
-   continue;
-   if (ifp-if_watchdog)
+   if (ifp-if_watchdog) {
+   if (ifp-if_timer  0  --ifp-if_timer == 0)
(*ifp-if_watchdog)(ifp);
+   timeout_add(ifp-if_wdtimo, hz / IFNET_SLOWHZ);
}
splx(s);
-   timeout_add(to, hz / IFNET_SLOWHZ);
 }
 
 /*
@@ -1824,12 +1825,6 @@ int
 if_detached_ioctl(struct ifnet *ifp, u_long a, caddr_t b)
 {
return ENODEV;
-}
-
-void
-if_detached_watchdog(struct ifnet *ifp)
-{
-   /* nothing */
 }
 
 /*
Index: sys/net/if_trunk.c
===
RCS file: /home/cvs/src/sys/net/if_trunk.c,v
retrieving revision 1.91
diff -u -p -r1.91 if_trunk.c
--- sys/net/if_trunk.c  18 Nov 2014 02:37:31 -  1.91
+++ sys/net/if_trunk.c  22 Nov 2014 16:25:34 -
@@ -344,11 +344,13 @@ trunk_port_create(struct trunk_softc *tr
tp-tp_iftype = ifp-if_type;
ifp-if_type = IFT_IEEE8023ADLAG;
ifp-if_tp = (caddr_t)tp;
-   tp-tp_watchdog = ifp-if_watchdog;
-   ifp-if_watchdog = trunk_port_watchdog;
tp-tp_ioctl = ifp-if_ioctl;
ifp-if_ioctl = trunk_port_ioctl;
 
+   timeout_del(ifp-if_wdtimo);
+   tp-tp_watchdog = ifp-if_watchdog;
+   ifp-if_watchdog = trunk_port_watchdog;
+
tp-tp_if = ifp;
tp-tp_trunk = tr;
 
@@ -427,8 +429,8 @@ trunk_port_destroy(struct trunk_port *tp
 
/* Restore interface */
ifp-if_type = tp-tp_iftype;
-   ifp-if_watchdog = tp-tp_watchdog;
ifp-if_ioctl = tp-tp_ioctl;
+   ifp-if_watchdog = tp-tp_watchdog;
ifp-if_tp = NULL;
 

Re: Make every interface with a watchdog register it's own timeout

2014-11-22 Thread Mike Belopuhov
On Sun, Nov 23, 2014 at 02:10 +0100, Mike Belopuhov wrote:
 Hi,
 
 This removes the system wide if_slowtimo timeout and lets every
 interface with a valid if_watchdog method register it's own.
 The rational is to get rid of the ifnet loop in the softclock
 context to avoid further complications with concurrent access
 to the ifnet list.  This might also save some cpu cycles in the
 runtime if number of interfaces is large while only a fraction
 all of them have a watchdog routine.
 
 This has originally come up on NetBSD mailing lists with some
 rather complicated locking strategy.  Masao Uebayashi has later
 proposed a similar solution.
 
 The diff was looked at and corrected by mpi@.  I've tested the
 trunk and a system build as well.  OK?
 
[snip]
 Index: sys/net/if_var.h
 ===
 RCS file: /home/cvs/src/sys/net/if_var.h,v
 retrieving revision 1.12
 diff -u -p -r1.12 if_var.h
 --- sys/net/if_var.h  8 Jul 2014 04:02:14 -   1.12
 +++ sys/net/if_var.h  22 Nov 2014 16:25:34 -
 @@ -72,6 +72,7 @@ struct mbuf;
  struct proc;
  struct rtentry;
  struct socket;
 +struct timeout;
  struct ether_header;
  struct arpcom;
  struct rt_addrinfo;
 @@ -147,6 +148,7 @@ struct ifnet {/* and the 
 entries */
   charif_description[IFDESCRSIZE]; /* interface description */
   u_short if_rtlabelid;   /* next route label */
   u_int8_t if_priority;
 + struct  timeout *if_wdtimo; /* watchdog timeout */
  
   /* procedure handles */
   /* output routine (enqueue) */

Now that I think of it, perhaps I should have called it if_slowtimo like
the function.  Any preference?



remove useless includes in netstat

2014-11-21 Thread Mike Belopuhov
apparently these are not needed and just make my life harder.
ok?

diff --git usr.bin/netstat/if.c usr.bin/netstat/if.c
index f722db23..b07860d 100644
--- usr.bin/netstat/if.c
+++ usr.bin/netstat/if.c
@@ -36,11 +36,10 @@
 #include sys/protosw.h
 #include sys/socket.h
 #include sys/sysctl.h
 
 #include net/if.h
-#include net/if_var.h
 #include net/if_dl.h
 #include net/if_types.h
 #include net/route.h
 #include netinet/in.h
 #include netinet/in_var.h
diff --git usr.bin/netstat/mroute6.c usr.bin/netstat/mroute6.c
index b198ea4..6b17f57 100644
--- usr.bin/netstat/mroute6.c
+++ usr.bin/netstat/mroute6.c
@@ -65,11 +65,10 @@
  */
 
 #include sys/param.h
 #include sys/queue.h
 #include sys/socket.h
-#include sys/socketvar.h
 #include sys/protosw.h
 #include sys/sysctl.h
 
 #include net/if.h
 #include net/if_var.h
diff --git usr.bin/netstat/net80211.c usr.bin/netstat/net80211.c
index dc40c09..4162d59 100644
--- usr.bin/netstat/net80211.c
+++ usr.bin/netstat/net80211.c
@@ -21,11 +21,10 @@
 #include sys/socket.h
 #include sys/file.h
 #include sys/ioctl.h
 
 #include net/if.h
-#include net/if_var.h
 
 #include netinet/in.h
 #include netinet/if_ether.h
 
 #include net80211/ieee80211.h



Re: ping6 to Link Local disturbed by pf set skip?

2014-11-18 Thread Mike Belopuhov
On 18 November 2014 15:30, Martin Pieuchot mpieuc...@nolizard.org wrote:
 On 13/11/14(Thu) 16:41, Stuart Henderson wrote:
 This changes behaviour of ping6 ff02::1%pppoe0 for me, previously I saw
 a response to each icmp message in the sequence, now I just see the
 first response.  I am not using set skip on that machine.

 $ ping6 ff02::1%pppoe0
 PING6(56=40+8+8 bytes) fe80::225:90ff:fec0:77b4%pppoe0 -- ff02::1%pppoe0
 16 bytes from fe80:15::225:90ff:fec0:77b4, icmp_seq=0 hlim=64 time=0.271 ms
 ^C
 --- ff02::1%pppoe0 ping6 statistics ---
 6 packets transmitted, 1 packets received, 83.3% packet loss
 round-trip min/avg/max/std-dev = 0.271/0.271/0.271/0.000 ms

 Thanks for spotting this Stuart, diff below fixes this issue.

 It was another funky situation to debug.  Problem was that 2 new states
 were created on loopback for the echo reply instead of matching the
 corresponding echo request states.  But according to mikeb@ new states
 should never be created for replies.

 So the diff below changes the icmp multicast check in pf to take into
 consideration the scope of the sender, and in a more general way no
 longer clear the scopes of addresses for packets sent through loopback
 when calling pf_test().


To put it into the perspective: firewall policy wise we want
to differentiate between traffic on one interface and traffic
on the other interface.  Since you can have exactly the same
IPv6 link-local addresses on two interfaces that only differ
in scope IDs we want to make sure that state matching code
treats packets outgoing on these two interfaces as subjects
to different policy restrictions and therefore can't match the
same state.  With this patch we ensure that IPv6 input and
output routines provide scoped addresses to the pf and pf uses
them to create states.

Needless to say I'm OK with both diffs.



Re: VPLS patch [0/3]: introduction

2014-11-14 Thread Mike Belopuhov
On 14 November 2014 17:26, Rafael Zalamena rzalam...@gmail.com wrote:
 On Sun, Sep 14, 2014 at 11:48:11PM -0300, Rafael Zalamena wrote:
 The following mails will contain patchs that implement the VPLS datapath
 in OpenBSD. Applying all patchs should allow people to configure a
 network using VPLS manually.

 --- snipped diffs descriptions ---

 How to use:
  * Create a MPLS network.
Example: http://2011.eurobsdcon.org/papers/jeker/MPLS.pdf
  * Create a pseudowire in both ends of your network (PEs)
# ifconfig wirenumber encap ethernet wirelabel local label \
remote label neighbor other PE address controlword up

# ifconfig wire0 encap ethernet wirelabel 500 500 neighbor 1.2.3.4 up
  or
# ifconfig wire0 encap ethernet wirelabel 500 500 neighbor 1.2.3.4 \
controlword up
  * Create a bridge between the interface facing your customer (CE) and
your wireX, also in both PEs you are configuring the VPN.

 --- more comments snipped ---

 TODO list:
 * interface configuration code - SIOCSETWIRECFG / SIOCGETWIRECFG (DONE)
 * add / remove wire label (DONE)
 * add / remove wire control label (DONE)
 * ethernet-vlan support (WIP)

 Ethernet-tagged support almost complete, it's not working in the case when
 you have packets coming with 2 or more tags. I'm having problems to test
 this since 5.6 and -current doesn't do QinQ properly. (I'll be sending
 proposal diffs to fix this soon)

 * ifconfig(8) integration
 ** show wire configuration (DONE)
 ** configure wire (DONE)
 * man page:
 ** wire(4) (TODO)
 ** ifconfig(8) (TODO)

 wire(4) is depending on some other diffs now that are unrelated to this,
 please see:

 (update mpe to use rt_ifa, wire will use that too)
 http://marc.info/?l=openbsd-techm=141280528700615w=2

 (fix bridge + vlan, bridge expects the complete packet)
 http://marc.info/?l=openbsd-techm=141575896420071w=2


is it possible to call it something other than just wire(4)?
vpls maybe?



Re: improving OpenBSD's gmac.c...

2014-11-12 Thread Mike Belopuhov
On 10 October 2014 02:39, Damien Miller d...@mindrot.org wrote:
 On Thu, 9 Oct 2014, Christian Weisgerber wrote:

 John-Mark Gurney:

  I also have an implementation of ghash that does a 4 bit lookup table
  version with the table split between cache lines in p4 at:
  https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.cREV=4
 
  This also has a version with does 4 blocks at a time getting a
  further speed up...

 FWIW, I did a quick  dirty merge of this into the OpenBSD tree and
 the speed of my test (net6501-50, tcpbench -u over esp aes-128-gmac)
 almost doubled.

 isn't this likely to make it more likely to be subject to timing
 attacks?


then how is this different to our table based aes implementation?
and it's the same C code as in openssl which also uses table based
gcm implementation.

what countermeasures can be applied to the table lookup code
to fight these attacks?



Re: 5.6 Icmp6 checksum / pf

2014-11-10 Thread Mike Belopuhov
On Sun, Nov 09, 2014 at 10:17 +0100, Bastien Durel wrote:
 Hi,
 
 I recently upgraded to 5.6, and got problems with icmpv6
 
 I have a gif tunnel for IPv6:
 [root@fremen root]# ifconfig gif0 
   
   
   
 gif0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1280
 description: Sixxs
 priority: 0
 groups: gif egress
 tunnel: inet 78.194.94.73 - 212.100.184.146
 inet6 fe80::200:24ff:fecf:42ac%gif0 - prefixlen 64 scopeid 0x10
 inet6 2001:6f8:202:19c::2 - 2001:6f8:202:19c::1 prefixlen 128
 
 When I initiate a ping from this interface, it works as intended:
 
 08:59:38.376107 2001:6f8:202:19c::2  2001:41d0:8:91a::1: icmp6: echo request 
 (id:392e seq:0) [icmp6 cksum ok] (len 16, hlim 64)
 08:59:38.410385 2001:41d0:8:91a::1  2001:6f8:202:19c::2: icmp6: echo reply 
 (id:392e seq:0) [icmp6 cksum ok] (len 16, hlim 57)
 08:59:39.389383 2001:6f8:202:19c::2  2001:41d0:8:91a::1: icmp6: echo request 
 (id:392e seq:1) [icmp6 cksum ok] (len 16, hlim 64)
 08:59:39.421397 2001:41d0:8:91a::1  2001:6f8:202:19c::2: icmp6: echo reply 
 (id:392e seq:1) [icmp6 cksum ok] (len 16, hlim 57)
 
 But when a ping from outside reached it, the echo reply is sent with a
 bad (0) checksum, and the packet is dropped by te other side:
 
 09:40:28.852102 2001:41d0:8:91a::1  2001:6f8:202:19c::2: icmp6: echo request 
 (id:6c10 seq:1) [icmp6 cksum ok] (len 64, hlim 57)
 09:40:28.852251 2001:6f8:202:19c::2  2001:41d0:8:91a::1: icmp6: echo reply 
 (id:6c10 seq:1) [bad icmp6 cksum 0! - 2d93] (len 64, hlim 64)
 09:40:29.860327 2001:41d0:8:91a::1  2001:6f8:202:19c::2: icmp6: echo request 
 (id:6c10 seq:2) [icmp6 cksum ok] (len 64, hlim 57)
 09:40:29.860432 2001:6f8:202:19c::2  2001:41d0:8:91a::1: icmp6: echo reply 
 (id:6c10 seq:2) [bad icmp6 cksum 0! - 8a71] (len 64, hlim 64)
 
 It works correctly with this pf rule disabled:
 pass in on gif0 reply-to ( gif0 2001:6f8:202:19c::1 ) keep state
 
 What's the correct way to handle correct return-path ?
 
 Regards,
 
 -- 
 Bastien Durel
 

hi,

looks like it's caused by the forgotten in6_proto_cksum_out call
in the pf_route6 after possible pf_map_addr/PF_ACPY manipulations.

bastien, can you please verify that the diff below fixes your
issue?

diff --git sys/net/pf.c sys/net/pf.c
index 979f44d..4c934db 100644
--- sys/net/pf.c
+++ sys/net/pf.c
@@ -5777,20 +5777,22 @@ pf_route6(struct mbuf **m, struct pf_rule *r, int dir, 
struct ifnet *oifp,
goto bad;
else if (m0 == NULL)
goto done;
if (m0-m_len  sizeof(struct ip6_hdr)) {
DPFPRINTF(LOG_ERR,
pf_route6: m0-m_len  sizeof(struct ip6_hdr));
goto bad;
}
}
 
+   in6_proto_cksum_out(m0, ifp);
+
/*
 * If the packet is too large for the outgoing interface,
 * send back an icmp6 error.
 */
if (IN6_IS_SCOPE_EMBED(dst-sin6_addr))
dst-sin6_addr.s6_addr16[1] = htons(ifp-if_index);
if ((u_long)m0-m_pkthdr.len = ifp-if_mtu) {
nd6_output(ifp, ifp, m0, dst, NULL);
} else {
in6_ifstat_inc(ifp, ifs6_in_toobig);



Re: iked responds with esp over external ips.

2014-11-05 Thread Mike Belopuhov
On 4 November 2014 17:06, Martin Larsson martin.larss...@gmail.com wrote:
 Hello!

 I've setup a tunnel between OpenBSD 5.6 using iked and an openwrt router
 running strongswan.
 The tunnel works great with ping and other traffic but traffic between the
 two external ip's dies.

 This is a site-to-site connection and nothing fancy.

 iked.conf on OpenBSD.
 ikev2 esp from $10.11.12.0/24 to $194.168.4.0/24 peer $tcgw srcid sippan.se

 # ipsecctl -sa
 FLOWS:
 flow esp in from 192.168.4.0/24 to 10.11.12.0/24 peer 82.17.12.21 srcid
 FQDN/sippan.se dstid FQDN/sswan.sippan.se type use
 flow esp out from 10.11.12.0/24 to 192.168.4.0/24 peer 82.17.12.21 srcid
 FQDN/sippan.se dstid FQDN/sswan.sippan.se type require
 flow esp out from ::/0 to ::/0 type deny

 SAD:
 esp tunnel from 82.17.12.21 to 130.51.23.4 spi 0x67483925 auth hmac-sha1
 enc aes
 esp tunnel from 130.51.23.4 to 82.17.12.21 spi 0xcf1f39d1 auth hmac-sha1
 enc aes

 # netstat -nr
 Routing tables

 Internet:
 DestinationGatewayFlags   Refs  Use   Mtu  Prio
 Iface
 default130.51.23.4 UGS   10 30430256 - 8 em0
 10/8   link#5 UC 10 - 4
 vether0
 10.11.12.13fe:e1:ba:d0:d6:1c  UHLl   01 - 1 lo0
 10.255.255.255 link#5 UHLc   3  570 - 4
 vether0
 82.17.12.21  130.51.23.4 UGHD   0 30430251 - L  56 em0
 127/8  127.0.0.1  UGRS   00 32768 8 lo0
 127.0.0.1  127.0.0.1  UH 16 32768 4 lo0
 194.48.213.128/27  link#1 UC 10 - 4 em0
 130.51.23.4 00:00:cd:19:95:16  UHLc   20 - 4 em0
 130.51.23.4 00:02:b3:aa:cc:c3  HLl00 - 1 lo0
 224/4  127.0.0.1  URS00 32768 8 lo0

 Internet6:
 -removed, dont use it-

 Encap:
 Source Port  DestinationPort  Proto
 SA(Address/Proto/Type/Direction)
 192.168.4/24   0 10.11.12/240 0
 82.17.12.21/esp/use/in
 10.11.12/240 192.168.4/24   0 0
 82.17.12.21/esp/require/out
 default0 default
  0 0 none/esp/deny/out

 # tcpdump on openbsd while trying to connect with ssh to the external ip of
 the OpenBSD host from the exernal ip of the other end.

 # tcpdump host 82.17.12.21
 tcpdump: listening on em0, link-type EN10MB
 tcpdump: WARNING: compensating for unaligned libpcap packets
 16:49:55.539903 egget.priv.lamest.se.54158  loller.sippan.se.ssh: S
 2729317717:2729317717(0) win 8192 mss 1460,nop,wscale 8,nop,nop,sackOK
 (DF)
 16:49:55.539932 loller.sippan.se.ssh  egget.priv.lamest.se.54158: S
 2317435827:2317435827(0) ack 2729317718 win 16384 mss
 1240,nop,nop,sackOK,nop,wscale 3
 16:49:55.545936 egget.priv.lamest.se.54158  loller.sippan.se.ssh: . ack 1
 win 256 (DF)
 16:49:55.553927 esp loller.sippan.se  egget.priv.lamest.se spi 0xcf1f39d1
 seq 190 len 100
 16:50:01.553883 esp loller.sippan.se  egget.priv.lamest.se spi 0xcf1f39d1
 seq 191 len 100
 16:50:05.977468 esp egget.priv.lamest.se  loller.sippan.se spi 0x67483925
 seq 127 len 84 (DF)
 16:50:05.977519 esp loller.sippan.se  egget.priv.lamest.se spi 0xcf1f39d1
 seq 192 len 84


 # tcpdump on enc0 while trying ssh and https
 tcpdump: listening on enc0, link-type ENC
 tcpdump: WARNING: compensating for unaligned libpcap packets
 17:01:01.578622 (authentic,confidential): SPI 0xc31749f4:
 loller.sippan.se.ssh  egget.priv.lamest.se.54158: R
 2317435850:2317435850(0) ack 2729317718 win 0 (encap)
 17:01:05.786123 (authentic,confidential): SPI 0xc31749f4:
 loller.sippan.se.ssh  egget.priv.lamest.se.54792: P
 3813334764:3813334785(21) ack 2711749548 win 2170 (encap)
 17:01:05.968654 (authentic,confidential): SPI 0xc31749f4:
 loller.sippan.se.https  egget.priv.lamest.se.54793: P
 3540908942:3540909100(158) ack 1840586787 win 2170 (encap)
 17:01:06.265543 (authentic,confidential): SPI 0xc31749f4:
 loller.sippan.se.https  egget.priv.lamest.se.54793: . ack 1 win 2170
 (encap)
 17:01:06.876165 (authentic,confidential): SPI 0xc31749f4:
 loller.sippan.se.https  egget.priv.lamest.se.54793: . ack 1 win 2170
 (encap)
 17:01:08.095189 (authentic,confidential): SPI 0xc31749f4:
 loller.sippan.se.https  egget.priv.lamest.se.54793: . ack 1 win 2170
 (encap)
 17:01:10.459116 (authentic,confidential): SPI 0xc31749f4:
 loller.sippan.se.https  egget.priv.lamest.se.54793: . ack 1 win 2170
 (encap)

 So it appears that OpenBSD tries to send back traffic with ESP when it
 shouldn't.

 I'd also like to add that the exact same setup works with with isakmpd.

 Best regards
 Martin



This is a known issue.  The reason why it happens is not
strictly documented.  The change to the stack was introduced
as part of the cleanup long time ago.  Supposedly it's there
to support TCP MD5 signatures, but I didn't verify that yet.

There's this code in the 

Re: iked responds with esp over external ips.

2014-11-05 Thread Mike Belopuhov
On 5 November 2014 13:28, Mike Belopuhov m...@belopuhov.com wrote:
 On 4 November 2014 17:06, Martin Larsson martin.larss...@gmail.com wrote:
 Hello!

 I've setup a tunnel between OpenBSD 5.6 using iked and an openwrt router
 running strongswan.
 The tunnel works great with ping and other traffic but traffic between the
 two external ip's dies.

 This is a site-to-site connection and nothing fancy.

 iked.conf on OpenBSD.
 ikev2 esp from $10.11.12.0/24 to $194.168.4.0/24 peer $tcgw srcid sippan.se

 # ipsecctl -sa
 FLOWS:
 flow esp in from 192.168.4.0/24 to 10.11.12.0/24 peer 82.17.12.21 srcid
 FQDN/sippan.se dstid FQDN/sswan.sippan.se type use
 flow esp out from 10.11.12.0/24 to 192.168.4.0/24 peer 82.17.12.21 srcid
 FQDN/sippan.se dstid FQDN/sswan.sippan.se type require
 flow esp out from ::/0 to ::/0 type deny

 SAD:
 esp tunnel from 82.17.12.21 to 130.51.23.4 spi 0x67483925 auth hmac-sha1
 enc aes
 esp tunnel from 130.51.23.4 to 82.17.12.21 spi 0xcf1f39d1 auth hmac-sha1
 enc aes

 # netstat -nr
 Routing tables

 Internet:
 DestinationGatewayFlags   Refs  Use   Mtu  Prio
 Iface
 default130.51.23.4 UGS   10 30430256 - 8 em0
 10/8   link#5 UC 10 - 4
 vether0
 10.11.12.13fe:e1:ba:d0:d6:1c  UHLl   01 - 1 lo0
 10.255.255.255 link#5 UHLc   3  570 - 4
 vether0
 82.17.12.21  130.51.23.4 UGHD   0 30430251 - L  56 em0
 127/8  127.0.0.1  UGRS   00 32768 8 lo0
 127.0.0.1  127.0.0.1  UH 16 32768 4 lo0
 194.48.213.128/27  link#1 UC 10 - 4 em0
 130.51.23.4 00:00:cd:19:95:16  UHLc   20 - 4 em0
 130.51.23.4 00:02:b3:aa:cc:c3  HLl00 - 1 lo0
 224/4  127.0.0.1  URS00 32768 8 lo0

 Internet6:
 -removed, dont use it-

 Encap:
 Source Port  DestinationPort  Proto
 SA(Address/Proto/Type/Direction)
 192.168.4/24   0 10.11.12/240 0
 82.17.12.21/esp/use/in
 10.11.12/240 192.168.4/24   0 0
 82.17.12.21/esp/require/out
 default0 default
  0 0 none/esp/deny/out

 # tcpdump on openbsd while trying to connect with ssh to the external ip of
 the OpenBSD host from the exernal ip of the other end.

 # tcpdump host 82.17.12.21
 tcpdump: listening on em0, link-type EN10MB
 tcpdump: WARNING: compensating for unaligned libpcap packets
 16:49:55.539903 egget.priv.lamest.se.54158  loller.sippan.se.ssh: S
 2729317717:2729317717(0) win 8192 mss 1460,nop,wscale 8,nop,nop,sackOK
 (DF)
 16:49:55.539932 loller.sippan.se.ssh  egget.priv.lamest.se.54158: S
 2317435827:2317435827(0) ack 2729317718 win 16384 mss
 1240,nop,nop,sackOK,nop,wscale 3
 16:49:55.545936 egget.priv.lamest.se.54158  loller.sippan.se.ssh: . ack 1
 win 256 (DF)
 16:49:55.553927 esp loller.sippan.se  egget.priv.lamest.se spi 0xcf1f39d1
 seq 190 len 100
 16:50:01.553883 esp loller.sippan.se  egget.priv.lamest.se spi 0xcf1f39d1
 seq 191 len 100
 16:50:05.977468 esp egget.priv.lamest.se  loller.sippan.se spi 0x67483925
 seq 127 len 84 (DF)
 16:50:05.977519 esp loller.sippan.se  egget.priv.lamest.se spi 0xcf1f39d1
 seq 192 len 84


There's another interesting behavior evident in this trace: first few (usually
1 or 2) packets (SYN-ACK from loller here) are not encrypted.  That's because
we update our PCB on input due to the presence of a matching SA in a very agile
manner: first it doesn't really match, but we make sure next time it does
(funny logic in the ipsp_spd_inp that coupled with the default IPsec policy).



Re: Multipath for HOST p2p routes

2014-11-04 Thread Mike Belopuhov
On 4 November 2014 12:51, Martin Pieuchot mpieuc...@nolizard.org wrote:
 How are we suppose to support configuration with multiple p2p interfaces
 pointing to the same destination address?  Right now only one route to
 host is added.

 Diff below replaces a hack that move a host route from one p2p interface
 to another by always installing MPATH routes.  It assumes that multiples
 p2p interfaces configured to the same destination address will have
 different local addresses.

 ok?


After discussing it with Martin it sounds like a good idea.
There's some semi-unrelated cleanup in in_ifinit that he has
promised to commit separately.  OK mikeb for both changes.



Re: Kill in_iawithaddr()

2014-11-04 Thread Mike Belopuhov
On 4 November 2014 12:52, Martin Pieuchot mpieuc...@nolizard.org wrote:
 This function is just a wrapper around ifa_ifwithaddr() and I'd prefer
 to have less function iterating over the global list of interfaces.

 ok?



what's not immediately apparent is that it also makes sure that the
address that ifa_ifwithaddr returns is not a broadcast address. oh god.

ok for the cleanup



Re: network pool names

2014-11-04 Thread Mike Belopuhov
On 4 November 2014 13:23, Martin Pieuchot mpieuc...@nolizard.org wrote:
 Remove pl suffix, ok?


ok with a syncache instead of syn



Re: pool page colouring

2014-11-04 Thread Mike Belopuhov
On 5 November 2014 01:12, Mike Belopuhov m...@belopuhov.com wrote:

 well, first of all, right now this is a rather theoretical gain.  we
 need to test it
 to understand if it makes things easier.

err. i meant to say go faster not easier.



Re: pool page colouring

2014-11-04 Thread Mike Belopuhov
On 5 November 2014 00:38, David Gwynne da...@gwynne.id.au wrote:

 On 30 Oct 2014, at 07:52, Ted Unangst t...@tedunangst.com wrote:

 On Wed, Oct 29, 2014 at 07:25, David Gwynne wrote:


 i dunno. im fine with either removing colouring altogether or setting it
 from something else completely. i just want a decision to be made cos
 right now ph_color isnt set, which is a bug.

 there. i fixed it.

 looks like we were both ignorant and wrong. mikeb@ points out this from the 
 original slab paper:

 4.1. Impact of Buffer Address Distribution on Cache
 Utilization

 The address distribution of mid-size buffers can
 affect the system’s overall cache utilization. In par-
 ticular, power-of-two allocators - where all buffers
 are 2 n bytes and are 2 n -byte aligned - are pes-
 simal.* Suppose, for example, that every inode
 (∼ 300 bytes) is assigned a 512-byte buffer, 512-byte
 aligned, and that only the first dozen fields of an
 inode (48 bytes) are frequently referenced. Then
 the majority of inode-related memory traffic will be
 at addresses between 0 and 47 modulo 512. Thus
 the cache lines near 512-byte boundaries will be
 heavily loaded while the rest lie fallow. In effect
 only 9% (48/512) of the cache will be usable by
 inodes. Fully-associative caches would not suffer
 this problem, but current hardware trends are toward
 simpler rather than more complex caches.

 4.3. Slab Coloring

 The slab allocator incorporates a simple coloring
 scheme that distributes buffers evenly throughout
 the cache, resulting in excellent cache utilization
 and bus balance. The concept is simple: each time
 a new slab is created, the buffer addresses start at a
 slightly different offset (color) from the slab base
 (which is always page-aligned). For example, for a
 cache of 200-byte objects with 8-byte alignment, the
 first slab’s buffers would be at addresses 0, 200,
 400, ... relative to the slab base. The next slab’s
 buffers would be at offsets 8, 208, 408, ... and so
 on. The maximum slab color is determined by the
 amount of unused space in the slab.


 we run on enough different machines that i think we should consider this.


well, first of all, right now this is a rather theoretical gain.  we
need to test it
to understand if it makes things easier.  to see cache statistics we can use
performance counters, however current pctr code might be a bit out of date.

 so the question is if we do bring colouring back, how do we calculate it?
 arc4random? mask bits off ph_magic? atomic_inc something in the pool?
 read a counter from the pool? shift bits off the page address?

the way i read it is that you have a per-pool running value pr_color that you
increment by the item alignment or native cache line size modulo space
available for every page you are getting from uvm.  however i can see that
it might entail a problem by locating a page header (or was it page boundary?
don't have the code at hand) using simple math.



Re: pool page colouring

2014-10-29 Thread Mike Belopuhov
On 29 October 2014 22:52, Ted Unangst t...@tedunangst.com wrote:
 On Wed, Oct 29, 2014 at 07:25, David Gwynne wrote:


 i dunno. im fine with either removing colouring altogether or setting it
 from something else completely. i just want a decision to be made cos
 right now ph_color isnt set, which is a bug.

 there. i fixed it.


so is there any performance difference?



Re: pool page colouring

2014-10-28 Thread Mike Belopuhov
On 28 October 2014 17:02, Ted Unangst t...@tedunangst.com wrote:
 On Tue, Oct 28, 2014 at 16:49, David Gwynne wrote:
 when i shuffled the locking in pools around, page colouring was
 left behind.

 page colouring is where you offset items within a page if you have
 enough slack space. the previous implementation simply incremented
 the colour so each new page got the next offset. i didnt do this
 because the page and its items are now initted outside the lock,
 so maintaining that curcolour iterator wasnt as easy.

 this sidesteps the curcolor maintenance by just having each page
 randomly pick a colour when it's set up.

 Would it make more sense to use the page address to pick the color?


Does it actually still make sense to keep page coloring?  Is there still
benefit on modern hardware?



Re: A system without interface?

2014-10-14 Thread Mike Belopuhov
On 14 October 2014 11:01, Martin Pieuchot mpieuc...@nolizard.org wrote:
 On 08/10/14(Wed) 14:29, Martin Pieuchot wrote:
 I'm looking after the uses of the global list of interface.  These ones
 are pointless, you always have at least one interface on your system.

 Ok?

 Anyone?


looks good to me. ok mikeb



Re: RTFREE - rtfree

2014-10-08 Thread Mike Belopuhov
On 8 October 2014 12:24, Martin Pieuchot mpieuc...@nolizard.org wrote:
 Diff below kills the macro and use the fonction instead since they are
 equivalent.  It also replaces some 0 - NULL where it applies.  It does
 not include the manpage bits, I'll deal with that afterward.

 I'd appreciate reviews and oks.


although syn_cache_put doesn't really require nullification,
i'm ok with the diff.



Re: improving OpenBSD's gmac.c...

2014-10-08 Thread Mike Belopuhov
On 8 October 2014 00:48, John-Mark Gurney j...@funkthat.com wrote:
 Christian Weisgerber wrote this message on Tue, Oct 07, 2014 at 23:08 +0200:
 John-Mark Gurney:

  So, as I was working on FreeBSD's implementation of gmac.c, I noticed
  that I was able to get a significant speed up by using a mask instead
  of an if branch in ghash_gfmul in gmac.c from OpenBSD...
 
  Add a mask var and replace the code between the comments
  update Z and update V w/:
  mask = !!(x[i  3]  (1  (~i  7)));
  mask = ~(mask - 1);
 
  z[0] ^= v[0]  mask;
  z[1] ^= v[1]  mask;
  z[2] ^= v[2]  mask;
  z[3] ^= v[3]  mask;
 
  And you should see a nice performance increase...

 I tried this on a Soekris net6501-50 and the performance increase
 was around 1.3%.  (I set up an ESP transport association with
 AES-128-GMAC and pushed UDP traffic with tcpbench over it.)

 Yeh, I benchmarked the raw algo in userland, not part of IPsec.  I
 forget the resulting perf increase, but it was well more than 10-20%.

 A look at the generated amd64 assembly code shows that the change
 indeed removes a branch.  What's pretty shocking is that this code

 mul = v[3]  1;
 ...
 v[0] = (v[0]  1) ^ (0xe100 * mul);

 is turned into an actual imul instruction by GCC.  I used the same
 masking approach to get rid of the multiplication, but the improvement
 was minuscule (1%).

 Hmm. interesting...  In my code I switched both to using the and
 operator...

 I also have code which switches the registers to 64bits so that on
 amd64, we make better uses of registers, and on i386, the compilers
 are good at breaking down the 64bit registers to 32bit w/o extra
 work...

  I also have an implementation of ghash that does a 4 bit lookup table
  version with the table split between cache lines in p4 at:
  https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.cREV=4

 I'll have to look at this, but haven't there been increasing
 misgivings about table implementations for GHASH because of timing
 attacks?

 Well, the code avoids one issue but introduces another issue...  Each
 table lookup accesses all four lines (assuming 64 byte cache lines), so
 there isn't a problem there...  The issue intrroduded is that the first
 64 bits of a cache line are faster to access than the remaining bits...

 For more info, look at the NSS bug:
 https://bugzilla.mozilla.org/show_bug.cgi?id=868948

 Though considering that the AES implementation that FreeBSD is still
 using is the standard table based AES, there are also timing issues
 there too...  Yes, no excuse for opening up additional windows...

 I have thought about introducing an option of slow but secure and
 fast but insecure against local attackers...  Since we are already
 on the fast but insecure against local attackers path, I figured I'll
 take the extra performance...

 As with all things, there is a trade off between speed and security...
 As most IPsec gateways don't have a local user trying to target the
 key, this is less of an issue...  And even if they did, it'd probably
 be easier to use a local root exploit to get the key than try to mount
 a timing attack to extract a key that might change in an hour or a day...

 --
   John-Mark Gurney  Voice: +1 415 225 5579

  All that I will do, has been done, All that I have, has not.


Hi,

I've talked to Theo and it looks like we'll be importing your GF2
multiplication library as is.  I think we should concentrate on
making table version of gmac.c work better.

Cheers,
Mike



Re: splnet() and SIOCSIFADDR

2014-09-03 Thread Mike Belopuhov
On 3 September 2014 15:53, Martin Pieuchot mpieuc...@nolizard.org wrote:
 On 03/09/14(Wed) 15:25, Martin Pieuchot wrote:
 Drivers that need a splnet() protection inside their SIOCSIFADDR
 generally raise the spl level themselves, so we should not need
 to do that in in{6,}_ifinit().  One exception to this rule is,
 as always, carp(4)...

 So the diff below moves the spl dance inside carp's SIOCSIFADDR
 handler, it's a baby step, so we can take care of carp_set_addr{6,}
 later.

 mikeb@ pointed out that a spl dance was missing from the SIOCDIFADDR_IN6
 ioctl, updated diff below also fixes that.


i think we're ready to step on a carefully placed landmine. OK mikeb



Re: bge(4) Jumbo support for newer chipsets

2014-09-02 Thread Mike Belopuhov
On 2 September 2014 03:54, Brad Smith b...@comstyle.com wrote:
 On Wed, Aug 27, 2014 at 02:25:27AM -0400, Brad Smith wrote:
 Looking for some testing of the following diff to add Jumbo support for the
 BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766
 chipsets.

 Here is an updated diff with bge_rxrinfo() being fixed.


get it in.  OK mikeb



Re: minphys woes

2014-09-01 Thread Mike Belopuhov
On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote:
 On Fri, 29 Aug 2014, Mike Belopuhov wrote:
 correct me if i'm wrong, but what happens is that bread being a block
 read reads up to MAXBSIZE which is conveniently set to 64k and you can't
 create a filesystem with a larger block size.

 physio (raw device io) however doesn't go through bread and need to know
 how split the provided buffer in separate transactions hence minphys.

 Yes, that seems to be what happens. But if every adapter needs to support
 transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the
 adapter to be able to override the default minphys function with its own.
 And adapters that only support smaller transfers would need to have logic
 in their driver to be able to split the transfer into smaller chunks.


i believe that if you start digging you realise that (at least at some point)
the least common denominator is (or was) 64k meaning that even the shittiest
controller on vax can do 64k.  which means that we don't have code for a
filesystem or buffercache to probe the controller for a supported transfer
size.

 I think it makes more sense to have that logic in one place to be used by
 all drivers.

perhaps, but unless the filesystem will issue breads of larger blocks the
only benefit would be physio itself which doesn't really justify the change.



Re: reduce the number of missed PCB cache with tcpbench -su

2014-09-01 Thread Mike Belopuhov
On 29 August 2014 18:01, Damien Miller d...@mindrot.org wrote:
 What's the benefit of this?

This creates a UDP PCB per connection.  Otherwise we always rely on
matching the wildcard PCB.

 I've never seen an application do this;

I doubt that.  However, things like NTP or DNS servers usually expect
requests from everyone so they don't do it.  Now I have actually
identified one use case that this diff breaks: multiple senders won't
work.  Which is kind of annoying.

 wouldn't it be better to improve the PCB cache so it caught this case
 (which seems the usual way UDP applications behave) instead?


Definitely!  Using something like a prefix tree (mickey advocates hash
array mapped tries) would eliminate this second lookup and also solve
problems with attacking PCB hash with collisions and resizing the hash
table.

 -d




Re: minphys woes

2014-09-01 Thread Mike Belopuhov
On 1 September 2014 13:06, Stefan Fritsch s...@sfritsch.de wrote:
 On Mon, 1 Sep 2014, Mike Belopuhov wrote:

 On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote:
  Yes, that seems to be what happens. But if every adapter needs to support
  transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the
  adapter to be able to override the default minphys function with its own.
  And adapters that only support smaller transfers would need to have logic
  in their driver to be able to split the transfer into smaller chunks.
 

 i believe that if you start digging you realise that (at least at some point)
 the least common denominator is (or was) 64k meaning that even the shittiest
 controller on vax can do 64k.  which means that we don't have code for a
 filesystem or buffercache to probe the controller for a supported transfer
 size.

 That's possible. But what is really strange is, why does openbsd then have
 an infrastructure to set different max transfer sizes for physio on a
 per-adapter basis? This makes no sense. Either the drivers have to support
 64k transfers, in which case most of the minphys infrastructure is
 useless, or they don't have to. In the latter case the minphys
 infrastructure would need to be used in all code paths.


i haven't found a controller that does less than MAXPHYS.
perhaps they meant to improve the situation but stopped short.

  I think it makes more sense to have that logic in one place to be used by
  all drivers.

 perhaps, but unless the filesystem will issue breads of larger blocks the
 only benefit would be physio itself which doesn't really justify the change.

 You lost me there. Why should this depend on how many requests some
 filesystem makes with larger blocks? If there is the possibility that
 filesystems do such requests, someone needs to do the splitting of the
 requests. The question is who. The driver or the block layer.



well, the filesystem doesn't issue requests for more than 64k at a time.
newfs won't allow you to build a 128k block file filesystem.

 BTW, for my use case I don't actually want to limit the block size, but
 rather the number of DMA segments. But the most reasonable way to do that
 seemed to set minphys to max_segments * pagesize. If we change how these
 things work, we could take the number of DMA segments into account.

can't help you with this one.



Re: reduce the number of missed PCB cache with tcpbench -su

2014-08-31 Thread Mike Belopuhov
Daniel,  don't reply anything to Damien just yet.  Can you please run
a simple test on Monday.  Try tcpbench -u -n 2 ip (as in multi-
connection test) without your patch and then with the patch and see
if behavior is changed.

Thanks

On 29 August 2014 18:01, Damien Miller d...@mindrot.org wrote:
 On Fri, 29 Aug 2014, Daniel Jakots wrote:

 Hi,

 When running tcpbench -su, a lot of them are counted as missed PCB
 cache.
 ...
 + n = recvfrom(fd, ptb-dummybuf, ptb-dummybuf_len, 0,
 + (struct sockaddr *)ss, slen);
 + if (n  0  connect(fd, (const struct sockaddr *)ss, slen))
 + warn(fail to connect);

 What's the benefit of this? I've never seen an application do this;
 wouldn't it be better to improve the PCB cache so it caught this case
 (which seems the usual way UDP applications behave) instead?

 -d




Re: minphys woes

2014-08-29 Thread Mike Belopuhov
On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote:
 On Fri, 29 Aug 2014, Miod Vallat wrote:
  sc-sc_xfer_max is computed according to the host's capabilities. What I
  want to simulate with this diff is a host adapter that can only cope with
  transfers  64k == MAXPHYS.

 Back to your original problem, you might want to print the sc_link
 struct as well the scsi_adapter struct it points to, when you detect a
 transfer larger than MAXPHYS. It has likely been overriden or reset to
 NULL by mistake at some point.

 OK, I will try that. Will take a bit until I have time, though.

 But I have also read the code and I could not find a place in the path
 from bread() to the scsi adapter cmd function where the minphys function
 is called.


correct me if i'm wrong, but what happens is that bread being a block
read reads up to MAXBSIZE which is conveniently set to 64k and you can't
create a filesystem with a larger block size.

physio (raw device io) however doesn't go through bread and need to know
how split the provided buffer in separate transactions hence minphys.



Re: newfs.8

2014-08-29 Thread Mike Belopuhov
On 29 August 2014 08:19, Jason McIntyre j...@kerhand.co.uk wrote:

 is this correct? i'm not a user myself, but the text states that
 special, for mount_mfs, is typically that of the primary swap area.
 how would you even define the swap area without a disklabel?

 jmc


sort of yes.  mount_mfs(8) says this:

 [...] The special file is only used
 to read the disk label which provides a set of configuration parameters
 for the memory based file system.  The special file is typically that of
 the primary swap area, since that is where the file system will be backed
 up when free memory gets low and the memory supporting the file system
 has to be paged.  If the keyword ``swap'' is used instead of a special
 file name, default configuration parameters will be used.  (This option
 is useful when trying to use mount_mfs on a machine without any disks.)

in reality it fakes up a disklabel and proceeds.  number of XXX's
around this code is also mildly amusing...



Re: newfs.8

2014-08-29 Thread Mike Belopuhov
On 29 August 2014 13:44, Jason McIntyre j...@kerhand.co.uk wrote:
 On Fri, Aug 29, 2014 at 01:39:57PM +0200, Mike Belopuhov wrote:
 On 29 August 2014 08:19, Jason McIntyre j...@kerhand.co.uk wrote:
 
  is this correct? i'm not a user myself, but the text states that
  special, for mount_mfs, is typically that of the primary swap area.
  how would you even define the swap area without a disklabel?
 
  jmc
 

 sort of yes.  mount_mfs(8) says this:

  [...] The special file is only used
  to read the disk label which provides a set of configuration parameters
  for the memory based file system.  The special file is typically that of
  the primary swap area, since that is where the file system will be 
 backed
  up when free memory gets low and the memory supporting the file system
  has to be paged.  If the keyword ``swap'' is used instead of a special
  file name, default configuration parameters will be used.  (This option
  is useful when trying to use mount_mfs on a machine without any disks.)

 in reality it fakes up a disklabel and proceeds.  number of XXX's
 around this code is also mildly amusing...


 so the diff posted was correct and should be committed?
 jmc


yes, i believe so.  OK mikeb



Re: bge(4) Jumbo support for newer chipsets

2014-08-28 Thread Mike Belopuhov
On 28 August 2014 12:32, David Gwynne da...@gwynne.id.au wrote:

 On 28 Aug 2014, at 3:02 am, Mike Belopuhov m...@belopuhov.com wrote:

 On 27 August 2014 08:25, Brad Smith b...@comstyle.com wrote:
 Looking for some testing of the following diff to add Jumbo support for the
 BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766
 chipsets.



 i have tested this on Broadcom BCM5719 rev 0x01, unknown BCM5719 
 (0x5719001),
 APE firmware NCSI 1.1.15.0  and Broadcom BCM5714 rev 0xa3, BCM5715
 A3 (0x9003).

 it works, however i'm not strictly a fan of switching the cluster pool to
 larger one for 5714.  wasting another 8k page (on sparc for example) for
 every rx cluster in 90% cases sounds kinda wrong to me.  but ymmv.

 this is what MCLGETI was invented to solve though. comparing pre mclgeti to 
 what this does:


that doesn't make my point invalid though.

 a 5714 right now without jumbos would have 512 rings entries with 2048 bytes 
 on each. 2048 * 512 is a 1024k of ram. if we bumped the std ring up to jumbos 
 by default, 9216 * 512 would eat 4608k of ram.


your calculation is a bit off.  it's not 9216 * 512, in case of sparc64 it's
8k * 2 * 512 which is 8M.

but my concern is different:  you ask uvm to do more work for every cluster
since now you need 2 consequent pages of memory for one cluster instead of
just one that fits 8k/2k = 4 clusters.

 my boxes with bge with mclgeti generally sit around 40 clusters, but 
 sometimes end up around 80. 80 * 9216 is 720k. we can have jumbos and still 
 be ahead.

 if you compare the nics with split rings: 512 * 2048 + 256 * 9216 is ~3.3M. 
 the same chip with mclgeti and only doing a 1500 byte workload would be 80 * 
 2048 + 17 * 9216, or 300k.


 apart from that there's a deficiency in the diff itself.  you probably want
 to change MCLBYTES in bge_rxrinfo to bge_rx_std_len otherwise statistics
 look wrong.

 yeah.

 i have tested both 1500 and 9000 mtus on a 5714 and it is working well. as 
 you say, 5719 seems to be fine too, but ive only tested it with mtu 1500. ill 
 test 9k tomorrow.

 it needs tests on older chips too though.



Re: let vlan(4) mtu be limited by the parents hardmtu instead of current mtu

2014-08-27 Thread Mike Belopuhov
On 27 August 2014 13:17, David Gwynne da...@gwynne.id.au wrote:
 On Tue, Aug 26, 2014 at 09:11:14PM -0400, Brad Smith wrote:
 On 20/08/14 8:03 PM, David Gwynne wrote:
 sthen@ says this is likely a bit optimistic. while most of our drivers 
 unconditionally configure their max mru, there's some stupid ones that 
 still interpret the configured mtu as a what the mru should be.
 
 dlg

 oce(4) and ix(4) need to be fixed.

 this might fix ix(4). anyone able to test?


ix and oce (internally though) calculate various paramethers
related to flow control, internal buffer sizes and so on based
on the maximum frame size.  therefore if i set max frame size
to 16k but always use 1,5k all these calculations will be off.
and by killing SIOCSIFMTU you make it impossible for the driver
to readjust them.

we have no idea at the moment how will this affect nic's behavior.

pieces of such black magic such as the code under
/* Hardware Packet Buffer  Flow Control setup */ comment must
be studied before making such changes.

and as i've mentioned to dlg, ix diff also switches it to using
16k cluster pool no matter what you actually need.  this can be
changed, but needs testing (i'm doing it now).



Re: let vlan(4) mtu be limited by the parents hardmtu instead of current mtu

2014-08-27 Thread Mike Belopuhov
On 27 August 2014 13:23, David Gwynne da...@gwynne.id.au wrote:
 On Tue, Aug 26, 2014 at 09:11:14PM -0400, Brad Smith wrote:
 On 20/08/14 8:03 PM, David Gwynne wrote:
 sthen@ says this is likely a bit optimistic. while most of our drivers 
 unconditionally configure their max mru, there's some stupid ones that 
 still interpret the configured mtu as a what the mru should be.
 
 dlg

 oce(4) and ix(4) need to be fixed.

 this might fix oce. can anyone test?


works fine here.  i think we can go with this one.  OK mikeb



Re: bge(4) Jumbo support for newer chipsets

2014-08-27 Thread Mike Belopuhov
On 27 August 2014 08:25, Brad Smith b...@comstyle.com wrote:
 Looking for some testing of the following diff to add Jumbo support for the
 BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766
 chipsets.



i have tested this on Broadcom BCM5719 rev 0x01, unknown BCM5719 (0x5719001),
APE firmware NCSI 1.1.15.0  and Broadcom BCM5714 rev 0xa3, BCM5715
A3 (0x9003).

it works, however i'm not strictly a fan of switching the cluster pool to
larger one for 5714.  wasting another 8k page (on sparc for example) for
every rx cluster in 90% cases sounds kinda wrong to me.  but ymmv.

apart from that there's a deficiency in the diff itself.  you probably want
to change MCLBYTES in bge_rxrinfo to bge_rx_std_len otherwise statistics
look wrong.

i'm certainly OK with !5714 part of the diff.



Re: pf: once for match rules?

2014-08-20 Thread Mike Belopuhov
On Tue, Aug 12, 2014 at 18:26 +0200, Mike Belopuhov wrote:
 On Tue, Jul 22, 2014 at 19:03 +0200, Mike Belopuhov wrote:
  Hi,
  
  Before I send a diff for pfctl to disable once on match rules,
  I've decided to try and see how much work is it to make it actually
  work.  Turns out that I need to extend pf_rule_item by 3 pointers
  to track the match rule ruleset, anchor rule and the ruleset it
  belongs to.
  
  Here's what this means in practice.  Consider a ruleset:
  
   block drop all
   match out log proto tcp to port 22 once
   anchor foo all {
 match out log proto tcp to port 22 once
 anchor bar all {
   match out log proto tcp to port 22 once
   pass out quick proto tcp to port 22 once
 }
   }
  
  Once we send a packet to port 22 the ruleset collapses to just:
  
   block drop all
  
  Thoughts?
 
 Henning thinks it's a bit of an overkill.  Any other opinions?
 

here we go then.  OK?

diff --git sbin/pfctl/parse.y sbin/pfctl/parse.y
index c277b8d..61c2646 100644
--- sbin/pfctl/parse.y
+++ sbin/pfctl/parse.y
@@ -1488,12 +1488,18 @@ pfrule  : action dir logquick interface af 
proto fromto
if ($8.marker  FOM_SETPRIO) {
r.set_prio[0] = $8.set_prio[0];
r.set_prio[1] = $8.set_prio[1];
r.scrub_flags |= PFSTATE_SETPRIO;
}
-   if ($8.marker  FOM_ONCE)
+   if ($8.marker  FOM_ONCE) {
+   if (r.action == PF_MATCH) {
+   yyerror(can't specify once for 
+   match rules);
+   YYERROR;
+   }
r.rule_flag |= PFRULE_ONCE;
+   }
if ($8.marker  FOM_AFTO)
r.rule_flag |= PFRULE_AFTO;
r.af = $5;
 
if ($8.tag)



/dev/crypto removal (1/3): unlink pseudo device

2014-08-18 Thread Mike Belopuhov
first step is to unlink crypto(4) pseudo device from the architecture
dependant character device tables and kernel config files.

OK?

diff --git sys/arch/alpha/alpha/conf.c sys/arch/alpha/alpha/conf.c
index 83cea34..7d103af 100644
--- sys/arch/alpha/alpha/conf.c
+++ sys/arch/alpha/alpha/conf.c
@@ -190,11 +190,11 @@ struct cdevsw cdevsw[] =
 #endif
cdev_bio_init(NBIO,bio),/* 53: ioctl tunnel */
cdev_notdef(),
cdev_ptm_init(NPTY,ptm),/* 55: pseudo-tty ptm device */
cdev_hotplug_init(NHOTPLUG,hotplug), /* 56: devices hot plugging */
-   cdev_crypto_init(NCRYPTO,crypto), /* 57: /dev/crypto */
+   cdev_notdef(),  /* 57: was: /dev/crypto */
cdev_bktr_init(NBKTR,bktr), /* 58: Bt848 video capture device */
cdev_radio_init(NRADIO,radio), /* 59: generic radio I/O */
cdev_mouse_init(NWSMUX, wsmux), /* 60: ws multiplexor */
cdev_vscsi_init(NVSCSI, vscsi), /* 61: vscsi */
cdev_notdef(),
diff --git sys/arch/alpha/conf/GENERIC sys/arch/alpha/conf/GENERIC
index 7b0c790..dbc8bdc 100644
--- sys/arch/alpha/conf/GENERIC
+++ sys/arch/alpha/conf/GENERIC
@@ -422,8 +422,7 @@ option ONEWIREVERBOSE
 owid*  at onewire? # ID
 owsbm* at onewire? # Smart Battery Monitor
 owtemp* at onewire?# Temperature
 owctr* at onewire? # Counter device
 
-pseudo-device  crypto  1
 pseudo-device  hotplug 1   # devices hot plugging
 pseudo-device  wsmux   2   # mouse  keyboard multiplexor
diff --git sys/arch/amd64/amd64/conf.c sys/arch/amd64/amd64/conf.c
index 59d4c9b..5da5820 100644
--- sys/arch/amd64/amd64/conf.c
+++ sys/arch/amd64/amd64/conf.c
@@ -38,12 +38,10 @@
 #include sys/tty.h
 #include sys/vnode.h
 
 #include machine/conf.h
 
-#include inet.h
-
 #include wd.h
 bdev_decl(wd);
 #include fdc.h
 #include fd.h
 bdev_decl(fd);
@@ -254,11 +252,11 @@ struct cdevsw cdevsw[] =
cdev_tty_init(NUCOM,ucom),  /* 66: USB tty */
cdev_mouse_init(NWSKBD, wskbd), /* 67: keyboards */
cdev_mouse_init(NWSMOUSE,   /* 68: mice */
wsmouse),
cdev_mouse_init(NWSMUX, wsmux), /* 69: ws multiplexor */
-   cdev_crypto_init(NCRYPTO,crypto), /* 70: /dev/crypto */
+   cdev_notdef(),  /* 70: was: /dev/crypto */
cdev_tty_init(NCZ,cztty),   /* 71: Cyclades-Z serial port */
 #ifdef USER_PCICONF
cdev_pci_init(NPCI,pci),/* 72: PCI user */
 #else
cdev_notdef(),
diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC
index 09fd692..3177dd8 100644
--- sys/arch/amd64/conf/GENERIC
+++ sys/arch/amd64/conf/GENERIC
@@ -600,11 +600,10 @@ pseudo-device pctr1
 pseudo-device  nvram   1
 pseudo-device  hotplug 1   # devices hot plugging
 
 # mouse  keyboard multiplexor pseudo-devices
 pseudo-device  wsmux   2
-pseudo-device  crypto  1
 
 # Virtio devices
 virtio*at pci? # Virtio PCI device
 vioblk*at virtio?  # Virtio block device
 vio*   at virtio?  # Virtio network device
diff --git sys/arch/arm/arm/conf.c sys/arch/arm/arm/conf.c
index d381885..8f12732 100644
--- sys/arch/arm/arm/conf.c
+++ sys/arch/arm/arm/conf.c
@@ -53,12 +53,10 @@
 #include sys/conf.h
 #include sys/vnode.h
 
 #include machine/conf.h
 
-#include inet.h
-
 /*
  * From this point, these need to be MI foo.h files.
  */
 
 /*
@@ -321,11 +319,11 @@ struct cdevsw cdevsw[] = {
cdev_lkm_dummy(),   /* 42: reserved */
cdev_lkm_dummy(),   /* 43: reserved */
cdev_lkm_dummy(),   /* 44: reserved */
cdev_lkm_dummy(),   /* 45: reserved */
cdev_pf_init(NPF,pf),   /* 46: packet filter */
-   cdev_crypto_init(NCRYPTO,crypto),   /* 47: /dev/crypto */
+   cdev_notdef(),  /* 47: was: /dev/crypto */
cdev_lkm_dummy(),   /* 48: reserved */
cdev_lkm_dummy(),   /* 49: reserved */
cdev_systrace_init(NSYSTRACE,systrace), /* 50: system call tracing */
cdev_notdef(),  /* 51: reserved */
cdev_bio_init(NBIO,bio),/* 52: ioctl tunnel */
diff --git sys/arch/armish/conf/GENERIC sys/arch/armish/conf/GENERIC
index e72ff41..6bf908c 100644
--- sys/arch/armish/conf/GENERIC
+++ sys/arch/armish/conf/GENERIC
@@ -195,7 +195,6 @@ owsbm*  at onewire? # Smart Battery Monitor
 owtemp* at onewire?# Temperature
 owctr* at onewire? # Counter device
 
 # mouse  keyboard multiplexor pseudo-devices
 pseudo-device  wsmux   2
-pseudo-device  crypto  1
 pseudo-device  hotplug 1   # devices hot plugging
diff --git sys/arch/hppa/hppa/conf.c sys/arch/hppa/hppa/conf.c

/dev/crypto removal (2/3): remove kernel support

2014-08-18 Thread Mike Belopuhov
this removes /dev/crypto device interface and public key functionality
that is only usable via /dev/crypto.

OK?

diff --git sys/conf/files sys/conf/files
index 3941639..9af78cc 100644
--- sys/conf/files
+++ sys/conf/files
@@ -870,11 +870,10 @@ file crypto/blf.c (inet  ipsec) | crypto 
| vnd
 file crypto/cast.c (inet  ipsec) | crypto
 file crypto/ecb_enc.c  (inet  ipsec) | crypto
 file crypto/set_key.c  (inet  ipsec) | crypto
 file crypto/ecb3_enc.c (inet  ipsec) | crypto
 file crypto/crypto.c   (inet  ipsec) | crypto
-file crypto/cryptodev.c((inet  ipsec) | crypto)   
needs-flag
 file crypto/criov.c(inet  ipsec) | crypto
 file crypto/cryptosoft.c   (inet  ipsec) | crypto
 file crypto/xform.c(inet  ipsec) | crypto
 file crypto/xform_ipcomp.c (inet  ipsec) | crypto
 file crypto/arc4.c 
diff --git sys/crypto/crypto.c sys/crypto/crypto.c
index f62752d..4a11ae3 100644
--- sys/crypto/crypto.c
+++ sys/crypto/crypto.c
@@ -280,39 +280,10 @@ crypto_get_driverid(u_int8_t flags)
 /*
  * Register a crypto driver. It should be called once for each algorithm
  * supported by the driver.
  */
 int
-crypto_kregister(u_int32_t driverid, int *kalg,
-int (*kprocess)(struct cryptkop *))
-{
-   int s, i;
-
-   if (driverid = crypto_drivers_num || kalg  == NULL ||
-   crypto_drivers == NULL)
-   return EINVAL;
-
-   s = splvm();
-
-   for (i = 0; i = CRK_ALGORITHM_MAX; i++) {
-   /*
-* XXX Do some performance testing to determine
-* placing.  We probably need an auxiliary data
-* structure that describes relative performances.
-*/
-
-   crypto_drivers[driverid].cc_kalg[i] = kalg[i];
-   }
-
-   crypto_drivers[driverid].cc_kprocess = kprocess;
-
-   splx(s);
-   return 0;
-}
-
-/* Register a crypto driver. */
-int
 crypto_register(u_int32_t driverid, int *alg,
 int (*newses)(u_int32_t *, struct cryptoini *),
 int (*freeses)(u_int64_t), int (*process)(struct cryptop *))
 {
int s, i;
@@ -410,71 +381,10 @@ crypto_dispatch(struct cryptop *crp)
}
 
return 0;
 }
 
-int
-crypto_kdispatch(struct cryptkop *krp)
-{
-   if (crypto_taskq) {
-   task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL);
-   task_add(crypto_taskq, krp-krp_task);
-   } else {
-   crypto_kinvoke(krp);
-   }
-
-   return 0;
-}
-
-/*
- * Dispatch an asymmetric crypto request to the appropriate crypto devices.
- */
-int
-crypto_kinvoke(struct cryptkop *krp)
-{
-   extern int cryptodevallowsoft;
-   u_int32_t hid;
-   int error;
-   int s;
-
-   /* Sanity checks. */
-   if (krp == NULL || krp-krp_callback == NULL)
-   return (EINVAL);
-
-   s = splvm();
-   for (hid = 0; hid  crypto_drivers_num; hid++) {
-   if ((crypto_drivers[hid].cc_flags  CRYPTOCAP_F_SOFTWARE) 
-   cryptodevallowsoft == 0)
-   continue;
-   if (crypto_drivers[hid].cc_kprocess == NULL)
-   continue;
-   if ((crypto_drivers[hid].cc_kalg[krp-krp_op] 
-   CRYPTO_ALG_FLAG_SUPPORTED) == 0)
-   continue;
-   break;
-   }
-
-   if (hid == crypto_drivers_num) {
-   krp-krp_status = ENODEV;
-   crypto_kdone(krp);
-   splx(s);
-   return (0);
-   }
-
-   krp-krp_hid = hid;
-
-   crypto_drivers[hid].cc_koperations++;
-
-   error = crypto_drivers[hid].cc_kprocess(krp);
-   if (error) {
-   krp-krp_status = error;
-   crypto_kdone(krp);
-   }
-   splx(s);
-   return (0);
-}
-
 /*
  * Dispatch a crypto request to the appropriate crypto devices.
  */
 int
 crypto_invoke(struct cryptop *crp)
@@ -624,40 +534,5 @@ crypto_done(struct cryptop *crp)
task_set(crp-crp_task, (void (*))crp-crp_callback,
crp, NULL);
task_add(crypto_taskq, crp-crp_task);
}
 }
-
-/*
- * Invoke the callback on behalf of the driver.
- */
-void
-crypto_kdone(struct cryptkop *krp)
-{
-   task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL);
-   task_add(crypto_taskq, krp-krp_task);
-}
-
-int
-crypto_getfeat(int *featp)
-{
-   extern int cryptodevallowsoft, userasymcrypto;
-   int hid, kalg, feat = 0;
-
-   if (userasymcrypto == 0)
-   goto out; 
-   for (hid = 0; hid  crypto_drivers_num; hid++) {
-   if ((crypto_drivers[hid].cc_flags  CRYPTOCAP_F_SOFTWARE) 
-   cryptodevallowsoft == 0) {
-   continue;
-   }
-   if 

/dev/crypto removal (3/3): userland bits

2014-08-18 Thread Mike Belopuhov
please note that the commented out example usage in etc/MAKEDEV.common
remains till someone feels the need to change it.

OK?

diff --git etc/MAKEDEV.common etc/MAKEDEV.common
index bfcd943..b656d46 100644
--- etc/MAKEDEV.common
+++ etc/MAKEDEV.common
@@ -131,11 +131,10 @@ target(all, wd, 0, 1, 2, 3)dnl
 target(all, xd, 0, 1, 2, 3)dnl
 target(all, systrace)dnl
 target(all, pctr)dnl
 target(all, pctr0)dnl
 target(all, pf)dnl
-twrget(all, cry, crypto)dnl
 target(all, apm)dnl
 target(all, acpi)dnl
 twrget(all, tth, ttyh, 0, 1)dnl
 target(all, ttyA, 0, 1)dnl
 target(all, ttyB, 0, 1, 2, 3, 4, 5)dnl
@@ -446,12 +445,10 @@ __devitem(fdesc, fd, fd/* nodes, fd)dnl
 _mkdev(fdesc, fd, {-RMlist[${#RMlist[*]}]=;mkdir -p fd;rm -f n=0
while [ $n -lt 64 ];do M fd/$n c major_fdesc_c $n;n=Add($n, 1);done
MKlist[${#MKlist[*]}]=;chmod 555 fd-})dnl
 __devitem(oppr, openprom,PROM settings,openprom)dnl
 _cdev(oppr, openprom, 70, 0)dnl
-__devitem(cry, crypto, Hardware crypto access driver,crypto)dnl
-_mkdev(cry, crypto, {-M crypto c major_cry_c-} 0)dnl
 __devitem(pf, pf*, Packet Filter)dnl
 _mkdev(pf, {-pf*-}, {-M pf c major_pf_c 0 600-})dnl
 __devitem(bpf, bpf*, Berkeley Packet Filter)dnl
 _mkdev(bpf, {-bpf*-}, {-M bpf$U c major_bpf_c $U 600-}, 600)dnl
 _mkdev(tun, {-tun*-}, {-M tun$U c major_tun_c $U 600-}, 600)dnl
diff --git etc/etc.alpha/MAKEDEV etc/etc.alpha/MAKEDEV
index 2dd9859..050aa4e 100644
--- etc/etc.alpha/MAKEDEV
+++ etc/etc.alpha/MAKEDEV
@@ -68,11 +68,10 @@
 # Special purpose devices:
 #  audio*  Audio devices
 #  bio ioctl tunnel pseudo-device
 #  bktr*   Video frame grabbers
 #  bpf*Berkeley Packet Filter
-#  crypto  Hardware crypto access driver
 #  diskmap Disk mapper
 #  fd  fd/* nodes
 #  fuseUserland Filesystem
 #  hotplug devices hot plugging
 #  lkm Loadable kernel modules interface
@@ -324,14 +323,10 @@ fd)
 
 diskmap)
M diskmap c 63 0 640 operator
;;
 
-crypto)
-   M crypto c 57 0
-   ;;
-
 bpf*)
M bpf$U c 11 $U 600
;;
 
 bktr*)
@@ -539,11 +534,11 @@ all)
R local wscons pci0 pci1 pci2 pci3 uall rmidi0 rmidi1 rmidi2
R rmidi3 rmidi4 rmidi5 rmidi6 rmidi7 tuner0 radio0 speaker
R video0 video1 uk0 random lpa0 lpa1 lpa2 lpt0 lpt1 lpt2 lkm
R tty00 tty01 tty02 tty03 tty04 tty05 tty06 tty07 tty08 tty09
R tty0a tty0b ttyc0 ttyc1 ttyc2 ttyc3 ttyc4 ttyc5 ttyc6 ttyc7
-   R ttyB0 ttyB1 ttyB2 ttyB3 ttyB4 ttyB5 crypto pf systrace wd0
+   R ttyB0 ttyB1 ttyB2 ttyB3 ttyB4 ttyB5 pf systrace wd0
R wd1 wd2 wd3 std st0 st1 fd
;;
 
 wd*|sd*)
case $i in
diff --git etc/etc.amd64/MAKEDEV etc/etc.amd64/MAKEDEV
index 2971021..4a0d05c 100644
--- etc/etc.amd64/MAKEDEV
+++ etc/etc.amd64/MAKEDEV
@@ -68,11 +68,10 @@
 #  apm Power Management Interface
 #  audio*  Audio devices
 #  bio ioctl tunnel pseudo-device
 #  bktr*   Video frame grabbers
 #  bpf*Berkeley Packet Filter
-#  crypto  Hardware crypto access driver
 #  diskmap Disk mapper
 #  drm*Direct Rendering Manager
 #  fd  fd/* nodes
 #  fuseUserland Filesystem
 #  gpio*   General Purpose Input/Output
@@ -344,14 +343,10 @@ drm*)
 
 diskmap)
M diskmap c 90 0 640 operator
;;
 
-crypto)
-   M crypto c 70 0
-   ;;
-
 bpf*)
M bpf$U c 23 $U 600
;;
 
 bktr*)
@@ -565,11 +560,11 @@ all)
R wscons pci0 pci1 pci2 pci3 uall rmidi0 rmidi1 rmidi2 rmidi3
R rmidi4 rmidi5 rmidi6 rmidi7 tuner0 radio0 speaker video0
R video1 uk0 random lpa0 lpa1 lpa2 lpt0 lpt1 lpt2 lkm tty00
R tty01 tty02 tty03 tty04 tty05 tty06 tty07 tty08 tty09 tty0a
R tty0b ttyc0 ttyc1 ttyc2 ttyc3 ttyc4 ttyc5 ttyc6 ttyc7 apm
-   R crypto pf pctr systrace wd0 wd1 wd2 wd3 std st0 st1 fd
+   R pf pctr systrace wd0 wd1 wd2 wd3 std st0 st1 fd
;;
 
 wd*|sd*)
case $i in
wd*) dodisk wd $U 0 3 $U 0;;
diff --git etc/etc.armish/MAKEDEV etc/etc.armish/MAKEDEV
index 7a3a21e..25fd4d4 100644
--- etc/etc.armish/MAKEDEV
+++ etc/etc.armish/MAKEDEV
@@ -62,11 +62,10 @@
 #  apm Power management device
 #  audio*  Audio devices
 #  bio ioctl tunnel pseudo-device
 #  bktr*   Video frame grabbers
 #  bpf*Berkeley Packet Filter
-#  crypto  Hardware crypto access driver
 #  diskmap Disk mapper
 #  fd  fd/* nodes
 #  fuseUserland Filesystem
 #  hotplug devices hot plugging
 #  lkm Loadable kernel modules interface
@@ -302,14 +301,10 @@ fd)
 
 diskmap)
M diskmap c 102 0 640 operator
;;
 
-crypto)
-   M crypto c 47 0
-   ;;
-
 bpf*)
M bpf$U c 22 $U 600
;;
 
 bktr*)
@@ -481,11 +476,11 @@ all)
R bpf5 bpf6 bpf7 bpf8 bpf9 pty0 diskmap vscsi0 ch0 audio0
R audio1 audio2 fuse pppx hotplug ptm local wscons pci0 pci1
R pci2 pci3 uall rmidi0 rmidi1 rmidi2 rmidi3 rmidi4 rmidi5
  

Re: /dev/crypto removal (2/3): remove kernel support

2014-08-18 Thread Mike Belopuhov
On Mon, Aug 18, 2014 at 16:03 +0200, Mike Belopuhov wrote:
 this removes /dev/crypto device interface and public key functionality
 that is only usable via /dev/crypto.
 
 OK?
 

minor correction: preserve #ifdef _KERNEL in the cryptodev.h.  while
there are no userland programs including it atm leaving it for the
symmetry with cryptosoft.c seems like a good idea.

diff --git sys/conf/files sys/conf/files
index 3941639..9af78cc 100644
--- sys/conf/files
+++ sys/conf/files
@@ -870,11 +870,10 @@ file crypto/blf.c (inet  ipsec) | crypto 
| vnd
 file crypto/cast.c (inet  ipsec) | crypto
 file crypto/ecb_enc.c  (inet  ipsec) | crypto
 file crypto/set_key.c  (inet  ipsec) | crypto
 file crypto/ecb3_enc.c (inet  ipsec) | crypto
 file crypto/crypto.c   (inet  ipsec) | crypto
-file crypto/cryptodev.c((inet  ipsec) | crypto)   
needs-flag
 file crypto/criov.c(inet  ipsec) | crypto
 file crypto/cryptosoft.c   (inet  ipsec) | crypto
 file crypto/xform.c(inet  ipsec) | crypto
 file crypto/xform_ipcomp.c (inet  ipsec) | crypto
 file crypto/arc4.c 
diff --git sys/crypto/crypto.c sys/crypto/crypto.c
index f62752d..4a11ae3 100644
--- sys/crypto/crypto.c
+++ sys/crypto/crypto.c
@@ -280,39 +280,10 @@ crypto_get_driverid(u_int8_t flags)
 /*
  * Register a crypto driver. It should be called once for each algorithm
  * supported by the driver.
  */
 int
-crypto_kregister(u_int32_t driverid, int *kalg,
-int (*kprocess)(struct cryptkop *))
-{
-   int s, i;
-
-   if (driverid = crypto_drivers_num || kalg  == NULL ||
-   crypto_drivers == NULL)
-   return EINVAL;
-
-   s = splvm();
-
-   for (i = 0; i = CRK_ALGORITHM_MAX; i++) {
-   /*
-* XXX Do some performance testing to determine
-* placing.  We probably need an auxiliary data
-* structure that describes relative performances.
-*/
-
-   crypto_drivers[driverid].cc_kalg[i] = kalg[i];
-   }
-
-   crypto_drivers[driverid].cc_kprocess = kprocess;
-
-   splx(s);
-   return 0;
-}
-
-/* Register a crypto driver. */
-int
 crypto_register(u_int32_t driverid, int *alg,
 int (*newses)(u_int32_t *, struct cryptoini *),
 int (*freeses)(u_int64_t), int (*process)(struct cryptop *))
 {
int s, i;
@@ -410,71 +381,10 @@ crypto_dispatch(struct cryptop *crp)
}
 
return 0;
 }
 
-int
-crypto_kdispatch(struct cryptkop *krp)
-{
-   if (crypto_taskq) {
-   task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL);
-   task_add(crypto_taskq, krp-krp_task);
-   } else {
-   crypto_kinvoke(krp);
-   }
-
-   return 0;
-}
-
-/*
- * Dispatch an asymmetric crypto request to the appropriate crypto devices.
- */
-int
-crypto_kinvoke(struct cryptkop *krp)
-{
-   extern int cryptodevallowsoft;
-   u_int32_t hid;
-   int error;
-   int s;
-
-   /* Sanity checks. */
-   if (krp == NULL || krp-krp_callback == NULL)
-   return (EINVAL);
-
-   s = splvm();
-   for (hid = 0; hid  crypto_drivers_num; hid++) {
-   if ((crypto_drivers[hid].cc_flags  CRYPTOCAP_F_SOFTWARE) 
-   cryptodevallowsoft == 0)
-   continue;
-   if (crypto_drivers[hid].cc_kprocess == NULL)
-   continue;
-   if ((crypto_drivers[hid].cc_kalg[krp-krp_op] 
-   CRYPTO_ALG_FLAG_SUPPORTED) == 0)
-   continue;
-   break;
-   }
-
-   if (hid == crypto_drivers_num) {
-   krp-krp_status = ENODEV;
-   crypto_kdone(krp);
-   splx(s);
-   return (0);
-   }
-
-   krp-krp_hid = hid;
-
-   crypto_drivers[hid].cc_koperations++;
-
-   error = crypto_drivers[hid].cc_kprocess(krp);
-   if (error) {
-   krp-krp_status = error;
-   crypto_kdone(krp);
-   }
-   splx(s);
-   return (0);
-}
-
 /*
  * Dispatch a crypto request to the appropriate crypto devices.
  */
 int
 crypto_invoke(struct cryptop *crp)
@@ -624,40 +534,5 @@ crypto_done(struct cryptop *crp)
task_set(crp-crp_task, (void (*))crp-crp_callback,
crp, NULL);
task_add(crypto_taskq, crp-crp_task);
}
 }
-
-/*
- * Invoke the callback on behalf of the driver.
- */
-void
-crypto_kdone(struct cryptkop *krp)
-{
-   task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL);
-   task_add(crypto_taskq, krp-krp_task);
-}
-
-int
-crypto_getfeat(int *featp)
-{
-   extern int cryptodevallowsoft, userasymcrypto;
-   int hid, kalg, feat = 0;
-
-   if (userasymcrypto == 0)
-   goto out; 
-   for (hid

/dev/crypto removal (3bis): DTYPE_CRYPTO

2014-08-18 Thread Mike Belopuhov
I don't know if we recycle them somehow, but just in case...

diff --git sys/sys/file.h sys/sys/file.h
index d98118e..64c0f31 100644
--- sys/sys/file.h
+++ sys/sys/file.h
@@ -67,11 +67,11 @@ struct file {
short   f_flag; /* see fcntl.h */
 #defineDTYPE_VNODE 1   /* file */
 #defineDTYPE_SOCKET2   /* communications endpoint */
 #defineDTYPE_PIPE  3   /* pipe */
 #defineDTYPE_KQUEUE4   /* event queue */
-#defineDTYPE_CRYPTO5   /* crypto */
+/* was define  DTYPE_CRYPTO5 */
 #defineDTYPE_SYSTRACE  6   /* system call tracing */
short   f_type; /* descriptor type */
longf_count;/* reference count */
longf_msgcount; /* references from message queue */
struct  ucred *f_cred;  /* credentials associated with descriptor */



Re: Kill MRT_{ADD,DEL}_BW_UPCALL

2014-08-13 Thread Mike Belopuhov
On 13 August 2014 10:56, Martin Pieuchot mpieuc...@nolizard.org wrote:
 Our multicast routing code is insert your adjective and for the most
 part unused.  We discussed with claudio@ during t2k13 to rewrite only
 the parts that people currently use, any volunteer?

 In the meantime, I'd like to kill the obviously unused parts of it.  So
 here's a first diff that remove the bandwidth monitoring interface.
 Nothing use it in base and a quick search on codesearch.debian.net
 reveals that only net/xorp picks it if it finds the defines.

 Ok?


OK



Re: Fix pppoe(4) with rdomain != 0

2014-08-13 Thread Mike Belopuhov
OK

On 13 August 2014 11:56, Martin Pieuchot mpieuc...@nolizard.org wrote:
 ok?

 Index: net/if_pppoe.c
 ===
 RCS file: /home/ncvs/src/sys/net/if_pppoe.c,v
 retrieving revision 1.40
 diff -u -p -r1.40 if_pppoe.c
 --- net/if_pppoe.c  12 Jul 2014 18:44:22 -  1.40
 +++ net/if_pppoe.c  13 Aug 2014 09:56:16 -
 @@ -1398,6 +1398,9 @@ pppoe_send_padt(struct ifnet *outgoing_i
 memcpy(eh-ether_dhost, dest, ETHER_ADDR_LEN);

 m0-m_flags = ~(M_BCAST|M_MCAST);
 +   /* encapsulated packet is forced into rdomain of physical interface */
 +   m0-m_pkthdr.ph_rtableid = outgoing_if-if_rdomain;
 +
 return (outgoing_if-if_output(outgoing_if, m0, dst, NULL));
  }





[regress] convert aes testcase from /dev/crypto

2014-08-13 Thread Mike Belopuhov
in order to deprecate crypto(4) interface regress tests need to be
converted.

this aes test case actually uses ecb vectors, therefore no chaining
is required and the code looks very simple.

OK?

diff --git regress/sys/crypto/aes/Makefile regress/sys/crypto/aes/Makefile
index 459aedb..826d98c 100644
--- regress/sys/crypto/aes/Makefile
+++ regress/sys/crypto/aes/Makefile
@@ -1,9 +1,13 @@
 #   $OpenBSD: Makefile,v 1.2 2014/01/18 05:54:52 martynas Exp $
 
-PROG=   aestest
+DIR=   ${.CURDIR}/../../../../sys
+
+CFLAGS+=   -I${DIR}
 
+PROG=   aestest
+SRCS=  aestest.c
 CDIAGFLAGS=-Wall
 #CDIAGFLAGS+=  -Werror
 CDIAGFLAGS+=   -Wpointer-arith
 CDIAGFLAGS+=   -Wno-uninitialized
 CDIAGFLAGS+=   -Wstrict-prototypes
@@ -12,9 +16,12 @@ CDIAGFLAGS+= -Wunused
 CDIAGFLAGS+=   -Wsign-compare
 CDIAGFLAGS+=   -Wshadow
 
 REGRESS_ROOT_TARGETS=  run-regress-${PROG}
 
+.PATH: ${DIR}/crypto
+SRCS+= rijndael.c
+
 run-regress-${PROG}: ${PROG}
-   ${SUDO} ./${PROG} ${.CURDIR}/vectors/*.txt
+   ./${PROG} ${.CURDIR}/vectors/*.txt
 
 .include bsd.regress.mk
diff --git regress/sys/crypto/aes/aestest.c regress/sys/crypto/aes/aestest.c
index 2437c38..720dbc1 100644
--- regress/sys/crypto/aes/aestest.c
+++ regress/sys/crypto/aes/aestest.c
@@ -24,117 +24,39 @@
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 /*
- * Test crypto(4) AES with test vectors provided by Dr Brian Gladman:
- * http://fp.gladman.plus.com/AES/
+ * Test kernel AES implementation with test vectors provided by
+ * Dr Brian Gladman:  http://fp.gladman.plus.com/AES/
  */
 
-#include sys/types.h
 #include sys/param.h
-#include sys/ioctl.h
-#include sys/sysctl.h
-#include crypto/cryptodev.h
+#include crypto/rijndael.h
 #include err.h
-#include fcntl.h
 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include ctype.h
 
 static int
-syscrypt(const unsigned char *key, size_t klen, const unsigned char *in,
+docrypt(const unsigned char *key, size_t klen, const unsigned char *in,
 unsigned char *out, size_t len, int do_encrypt)
 {
-   struct session_op session;
-   struct crypt_op cryp;
-   int cryptodev_fd = -1, fd = -1;
-   u_char iv[32];
-
-   /*
-* Kludge; the kernel doesn't support ECB encryption so we
-* use a all-zero IV and encrypt a single block only, so the
-* result should be the same.
-*/
-   bzero(iv, sizeof(iv));
-
-   if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0))  0) {
-   warn(/dev/crypto);
-   goto err;
-   }
-   if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) {
-   warn(CRIOGET failed);
-   goto err;
-   }
-   memset(session, 0, sizeof(session));
-   session.cipher = CRYPTO_AES_CBC;
-   session.key = (caddr_t) key;
-   session.keylen = klen;
-   if (ioctl(fd, CIOCGSESSION, session) == -1) {
-   warn(CIOCGSESSION);
-   goto err;
-   }
-   memset(cryp, 0, sizeof(cryp));
-   cryp.ses = session.ses;
-   cryp.op = do_encrypt ? COP_ENCRYPT : COP_DECRYPT;
-   cryp.flags = 0;
-   cryp.len = len;
-   cryp.src = (caddr_t) in;
-   cryp.dst = (caddr_t) out;
-   cryp.iv = (caddr_t) iv;
-   cryp.mac = 0;
-   if (ioctl(fd, CIOCCRYPT, cryp) == -1) {
-   warn(CIOCCRYPT);
-   goto err;
-   }
-   if (ioctl(fd, CIOCFSESSION, session.ses) == -1) {
-   warn(CIOCFSESSION);
-   goto err;
-   }
-   close(fd);
-   close(cryptodev_fd);
-   return (0);
-
-err:
-   if (fd != -1)
-   close(fd);
-   if (cryptodev_fd != -1)
-   close(cryptodev_fd);
-   return (-1);
-}
-
-static int
-getallowsoft(void)
-{
-   int mib[2], old;
-   size_t olen;
-
-   olen = sizeof(old);
-
-   mib[0] = CTL_KERN;
-   mib[1] = KERN_CRYPTODEVALLOWSOFT;
-   if (sysctl(mib, 2, old, olen, NULL, 0)  0)
-   err(1, sysctl failed);
-
-   return old;
-}
-
-static void
-setallowsoft(int new)
-{
-   int mib[2], old;
-   size_t olen, nlen;
-
-   olen = nlen = sizeof(new);
-
-   mib[0] = CTL_KERN;
-   mib[1] = KERN_CRYPTODEVALLOWSOFT;
-
-   if (sysctl(mib, 2, old, olen, new, nlen)  0)
-   err(1, sysctl failed);
+   rijndael_ctx ctx;
+   int error = 0;
+
+   memset(ctx, 0, sizeof(ctx));
+   error = rijndael_set_key(ctx, key, klen * 8);
+   if (error)
+   return -1;
+   if (do_encrypt)
+   rijndael_encrypt(ctx, in, out);
+   else
+   rijndael_decrypt(ctx, in, out);
+   return 0;
 }
 
 static int
 match(unsigned char *a, unsigned char *b, size_t len)
 {
@@ -221,21 +143,21 @@ do_tests(const char *filename, int test_num, u_char *key, 
u_int keylen,
 {
char result[32];
int fail = 0;
 
/* Encrypt test */

[regress] convert aes-ctr test from /dev/crypto

2014-08-13 Thread Mike Belopuhov
this test is converted the same way jsing@ has recently converted
an xts test by pulling in xform.c code.

OK?

diff --git regress/sys/crypto/aesctr/Makefile regress/sys/crypto/aesctr/Makefile
index 31ae500..7310dbc 100644
--- regress/sys/crypto/aesctr/Makefile
+++ regress/sys/crypto/aesctr/Makefile
@@ -1,10 +1,29 @@
 #   $OpenBSD: Makefile,v 1.1 2005/05/25 05:47:53 markus Exp $
 
+DIR=   ${.CURDIR}/../../../../sys
+
+CFLAGS+=   -I${DIR}
+
 PROG=   aesctr
+SRCS=  aesctr.c
+
+CDIAGFLAGS=-Wall
+CDIAGFLAGS+=   -Werror
+CDIAGFLAGS+=   -Wpointer-arith
+CDIAGFLAGS+=   -Wno-uninitialized
+CDIAGFLAGS+=   -Wstrict-prototypes
+CDIAGFLAGS+=   -Wmissing-prototypes
+CDIAGFLAGS+=   -Wunused
+CDIAGFLAGS+=   -Wsign-compare
+CDIAGFLAGS+=   -Wshadow
 
 REGRESS_ROOT_TARGETS=  run-regress-${PROG}
 
+.PATH: ${DIR}/crypto
+SRCS+= cast.c ecb_enc.c ecb3_enc.c gmac.c rijndael.c set_key.c
+SRCS+= xform.c
+
 run-regress-${PROG}: ${PROG}
-   ${SUDO} ./${PROG}
+   ./${PROG}
 
 .include bsd.regress.mk
diff --git regress/sys/crypto/aesctr/aesctr.c regress/sys/crypto/aesctr/aesctr.c
index 4cc1a6e..3a0b4d1 100644
--- regress/sys/crypto/aesctr/aesctr.c
+++ regress/sys/crypto/aesctr/aesctr.c
@@ -14,17 +14,13 @@
  * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
 
-#include sys/types.h
 #include sys/param.h
-#include sys/ioctl.h
-#include sys/sysctl.h
-#include crypto/cryptodev.h
+#include crypto/rijndael.h
 #include err.h
-#include fcntl.h
 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include limits.h
@@ -128,92 +124,67 @@ struct {
B4 07 DF 86 65 69 FD 07 F4 8C C0 B5 83 D6 07 1F
/*1E C0 E6 B8*/,
},
 };
 
-static int
-syscrypt(const unsigned char *key, size_t klen, const unsigned char *iv,
-const unsigned char *in, unsigned char *out, size_t len, int encrypt)
-{
-   struct session_op session;
-   struct crypt_op cryp;
-   int cryptodev_fd = -1, fd = -1;
+/* Stubs */
 
-   if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0))  0) {
-   warn(/dev/crypto);
-   goto err;
-   }
-   if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) {
-   warn(CRIOGET failed);
-   goto err;
-   }
-   memset(session, 0, sizeof(session));
-   session.cipher = CRYPTO_AES_CTR;
-   session.key = (caddr_t) key;
-   session.keylen = klen;
-   if (ioctl(fd, CIOCGSESSION, session) == -1) {
-   warn(CIOCGSESSION);
-   goto err;
-   }
-   memset(cryp, 0, sizeof(cryp));
-   cryp.ses = session.ses;
-   cryp.op = encrypt ? COP_ENCRYPT : COP_DECRYPT;
-   cryp.flags = 0;
-   cryp.len = len;
-   cryp.src = (caddr_t) in;
-   cryp.dst = (caddr_t) out;
-   cryp.iv = (caddr_t) iv;
-   cryp.mac = 0;
-   if (ioctl(fd, CIOCCRYPT, cryp) == -1) {
-   warn(CIOCCRYPT);
-   goto err;
-   }
-   if (ioctl(fd, CIOCFSESSION, session.ses) == -1) {
-   warn(CIOCFSESSION);
-   goto err;
-   }
-   close(fd);
-   close(cryptodev_fd);
-   return (0);
+u_int32_t deflate_global(u_int8_t *, u_int32_t, int, u_int8_t **);
 
-err:
-   if (fd != -1)
-   close(fd);
-   if (cryptodev_fd != -1)
-   close(cryptodev_fd);
-   return (-1);
+u_int32_t
+deflate_global(u_int8_t *data, u_int32_t size, int comp, u_int8_t **out)
+{
+   return 0;
 }
 
-static int
-getallowsoft(void)
+void   explicit_bzero(void *, size_t);
+
+void
+explicit_bzero(void *b, size_t len)
 {
-   int mib[2], old;
-   size_t olen;
+   bzero(b, len);
+}
 
-   olen = sizeof(old);
+/* Definitions from /sys/crypto/xform.c */
 
-   mib[0] = CTL_KERN;
-   mib[1] = KERN_CRYPTODEVALLOWSOFT;
-   if (sysctl(mib, 2, old, olen, NULL, 0)  0)
-   err(1, sysctl failed);
+#define AESCTR_NONCESIZE   4
+#define AESCTR_IVSIZE  8
+#define AESCTR_BLOCKSIZE   16
 
-   return old;
-}
+struct aes_ctr_ctx {
+   u_int32_t   ac_ek[4*(AES_MAXROUNDS + 1)];
+   u_int8_tac_block[AESCTR_BLOCKSIZE];
+   int ac_nr;
+};
 
-static void
-setallowsoft(int new)
-{
-   int mib[2], old;
-   size_t olen, nlen;
+int  aes_ctr_setkey(void *, u_int8_t *, int);
+void aes_ctr_encrypt(caddr_t, u_int8_t *);
+void aes_ctr_decrypt(caddr_t, u_int8_t *);
+void aes_ctr_reinit(caddr_t, u_int8_t *);
 
-   olen = nlen = sizeof(new);
+static int
+docrypt(const unsigned char *key, size_t klen, const unsigned char *iv,
+const unsigned char *in, unsigned char *out, size_t len, int encrypt)
+{
+   u_int8_t block[AESCTR_BLOCKSIZE];
+   struct aes_ctr_ctx ctx;
+   int error = 0;
+   size_t i;
 
-   mib[0] = CTL_KERN;
-   mib[1] = KERN_CRYPTODEVALLOWSOFT;
+ 

[regress] convert enc (3des) test from /dev/crypto

2014-08-13 Thread Mike Belopuhov
this one with a bit of cheating however (manual cbc implementation).

OK?

diff --git regress/sys/crypto/enc/Makefile regress/sys/crypto/enc/Makefile
index cc29b32..8725f0c 100644
--- regress/sys/crypto/enc/Makefile
+++ regress/sys/crypto/enc/Makefile
@@ -1,12 +1,21 @@
 #   $OpenBSD: Makefile,v 1.5 2010/10/15 10:39:12 jsg Exp $
 
+DIR=   ${.CURDIR}/../../../../sys
+
+CFLAGS+=   -I${DIR}
+
 PROG=   des3
+SRCS=  des3.c
 LDADD=-lcrypto
 DPADD=${LIBCRYPTO}
 
 REGRESS_ROOT_TARGETS=  run-regress-${PROG}
 
+.PATH: ${DIR}/crypto
+SRCS+= cast.c ecb_enc.c ecb3_enc.c gmac.c rijndael.c set_key.c
+SRCS+= xform.c
+
 run-regress-${PROG}: ${PROG}
-   ${SUDO} ./${PROG}
+   ./${PROG}
 
 .include bsd.regress.mk
diff --git regress/sys/crypto/enc/des3.c regress/sys/crypto/enc/des3.c
index 024418d..fe67872 100644
--- regress/sys/crypto/enc/des3.c
+++ regress/sys/crypto/enc/des3.c
@@ -22,105 +22,73 @@
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include sys/types.h
 #include sys/param.h
-#include sys/ioctl.h
-#include sys/sysctl.h
-#include crypto/cryptodev.h
 #include openssl/des.h
 #include err.h
 #include fcntl.h
 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 
-static int
-syscrypt(const unsigned char *key, size_t klen, const unsigned char *iv,
-const unsigned char *in, unsigned char *out, size_t len, int encrypt)
-{
-   struct session_op session;
-   struct crypt_op cryp;
-   int cryptodev_fd = -1, fd = -1;
-
-   if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0))  0) {
-   warn(/dev/crypto);
-   goto err;
-   }
-   if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) {
-   warn(CRIOGET failed);
-   goto err;
-   }
-   memset(session, 0, sizeof(session));
-   session.cipher = CRYPTO_3DES_CBC;
-   session.key = (caddr_t) key;
-   session.keylen = klen;
-   if (ioctl(fd, CIOCGSESSION, session) == -1) {
-   warn(CIOCGSESSION);
-   goto err;
-   }
-   memset(cryp, 0, sizeof(cryp));
-   cryp.ses = session.ses;
-   cryp.op = encrypt ? COP_ENCRYPT : COP_DECRYPT;
-   cryp.flags = 0;
-   cryp.len = len;
-   cryp.src = (caddr_t) in;
-   cryp.dst = (caddr_t) out;
-   cryp.iv = (caddr_t) iv;
-   cryp.mac = 0;
-   if (ioctl(fd, CIOCCRYPT, cryp) == -1) {
-   warn(CIOCCRYPT);
-   goto err;
-   }
-   if (ioctl(fd, CIOCFSESSION, session.ses) == -1) {
-   warn(CIOCFSESSION);
-   goto err;
-   }
-   close(fd);
-   close(cryptodev_fd);
-   return (0);
+/* Stubs */
 
-err:
-   if (fd != -1)
-   close(fd);
-   if (cryptodev_fd != -1)
-   close(cryptodev_fd);
-   return (-1);
-}
+u_int32_t deflate_global(u_int8_t *, u_int32_t, int, u_int8_t **);
 
-static int
-getallowsoft(void)
+u_int32_t
+deflate_global(u_int8_t *data, u_int32_t size, int comp, u_int8_t **out)
 {
-   int mib[2], old;
-   size_t olen;
-
-   olen = sizeof(old);
-
-   mib[0] = CTL_KERN;
-   mib[1] = KERN_CRYPTODEVALLOWSOFT;
-   if (sysctl(mib, 2, old, olen, NULL, 0)  0)
-   err(1, sysctl failed);
-
-   return old;
+   return 0;
 }
 
-static void
-setallowsoft(int new)
+void   explicit_bzero(void *, size_t);
+
+void
+explicit_bzero(void *b, size_t len)
 {
-   int mib[2], old;
-   size_t olen, nlen;
+   bzero(b, len);
+}
 
-   olen = nlen = sizeof(new);
 
-   mib[0] = CTL_KERN;
-   mib[1] = KERN_CRYPTODEVALLOWSOFT;
+/* Simulate CBC mode */
 
-   if (sysctl(mib, 2, old, olen, new, nlen)  0)
-   err(1, sysctl failed);
+static int
+docrypt(const unsigned char *key, size_t klen, const unsigned char *iv0,
+const unsigned char *in, unsigned char *out, size_t len, int encrypt)
+{
+   u_int8_t block[8], iv[8], iv2[8], *ivp = iv, *nivp;
+   u_int8_t ctx[384];
+   int i, j, error = 0;
+
+   memcpy(iv, iv0, 8);
+   memset(ctx, 0, sizeof(ctx));
+   error = des3_setkey(ctx, key, klen);
+   if (error)
+   return -1;
+   for (i = 0; i  len / 8; i ++) {
+   bcopy(in, block, 8);
+   in += 8;
+   if (encrypt) {
+   for (j = 0; j  8; j++)
+   block[j] ^= ivp[j];
+   des3_encrypt(ctx, block);
+   memcpy(ivp, block, 8);
+   } else {
+   nivp = ivp == iv ? iv2 : iv;
+   memcpy(nivp, block, 8);
+   des3_decrypt(ctx, block);
+   for (j = 0; j  8; j++)
+   block[j] ^= ivp[j];
+   ivp = nivp;
+   }
+   

Re: pf: once for match rules?

2014-08-12 Thread Mike Belopuhov
On Tue, Jul 22, 2014 at 19:03 +0200, Mike Belopuhov wrote:
 Hi,
 
 Before I send a diff for pfctl to disable once on match rules,
 I've decided to try and see how much work is it to make it actually
 work.  Turns out that I need to extend pf_rule_item by 3 pointers
 to track the match rule ruleset, anchor rule and the ruleset it
 belongs to.
 
 Here's what this means in practice.  Consider a ruleset:
 
  block drop all
  match out log proto tcp to port 22 once
  anchor foo all {
match out log proto tcp to port 22 once
anchor bar all {
  match out log proto tcp to port 22 once
  pass out quick proto tcp to port 22 once
}
  }
 
 Once we send a packet to port 22 the ruleset collapses to just:
 
  block drop all
 
 Thoughts?

Henning thinks it's a bit of an overkill.  Any other opinions?

 
 diff --git sys/net/pf.c sys/net/pf.c
 index 9f0e2d6..5679a40 100644
 --- sys/net/pf.c
 +++ sys/net/pf.c
 @@ -3279,15 +3279,16 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule 
 **rm, struct pf_state **sm,
   PR_NOWAIT)) == NULL) {
   REASON_SET(reason, PFRES_MEMORY);
   goto cleanup;
   }
   ri-r = r;
 + ri-ar = a;
 + ri-rs = ruleset;
 + ri-ars = aruleset;
   /* order is irrelevant */
   SLIST_INSERT_HEAD(rules, ri, entry);
   pf_rule_to_actions(r, act);
 - if (r-rule_flag  PFRULE_AFTO)
 - pd-naf = r-naf;
   if (pf_get_transaddr(r, pd, sns, nr) == -1) {
   REASON_SET(reason, PFRES_TRANSLATE);
   goto cleanup;
   }
   if (r-log || act.log  PF_LOG_MATCHES) {
 @@ -3428,10 +3429,12 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule 
 **rm, struct pf_state **sm,
   virtual_type, icmp_dir);
   }
   } else {
   while ((ri = SLIST_FIRST(rules))) {
   SLIST_REMOVE_HEAD(rules, entry);
 + if (ri-r-rule_flag  PFRULE_ONCE)
 + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar);
   pool_put(pf_rule_item_pl, ri);
   }
   }
  
   /* copy back packet headers if needed */
 @@ -3454,10 +3457,23 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule 
 **rm, struct pf_state **sm,
   }
  #endif
  
   if (r-rule_flag  PFRULE_ONCE)
   pf_purge_rule(ruleset, r, aruleset, a);
 + if (*sm) {
 + SLIST_FOREACH(ri, (*sm)-match_rules, entry) {
 + if (ri-r-rule_flag  PFRULE_ONCE)
 + /*
 +  * We can be sure that pf_purge_rule won't
 +  * pool_put the rule because when *sm != NULL
 +  * STATE_INC_COUNTERS has increased states_cur.
 +  * pf_rule_item's and rules will be g/c'ed by
 +  * pf_free_state.
 +  */
 + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar);
 + }
 + }
  
  #if INET  INET6
   if (rewrite  skw-af != sks-af)
   return (PF_AFRT);
  #endif /* INET  INET6 */
 diff --git sys/net/pfvar.h sys/net/pfvar.h
 index a0d94f7..49af7b4 100644
 --- sys/net/pfvar.h
 +++ sys/net/pfvar.h
 @@ -691,10 +691,13 @@ struct pf_threshold {
  };
  
  struct pf_rule_item {
   SLIST_ENTRY(pf_rule_item)entry;
   struct pf_rule  *r;
 + struct pf_rule  *ar;
 + struct pf_ruleset   *rs;
 + struct pf_ruleset   *ars;
  };
  
  SLIST_HEAD(pf_rule_slist, pf_rule_item);
  
  enum pf_sn_types { PF_SN_NONE, PF_SN_NAT, PF_SN_RDR, PF_SN_ROUTE, PF_SN_MAX 
 };



pf: fixup pf_step_into_anchor to save current anchor rule pointer (2)

2014-07-22 Thread Mike Belopuhov
Hi,

This is a second diff and it makes sure that pf_step_into_anchor always
saves a pointer to the rule that owns the anchor on the pf anchor stack.
There's no reason why we should check for depth here.  As a side effect
this makes sure that the correct nested anchor gets it's counter bumped
instead of the top most.

For the save/restore symmetry pf_step_out_of_anchor is made to always
restore previous value of the anchor rule.  depth == 0 means what we a
at the top (main ruleset).

OK?

diff --git sys/net/pf.c sys/net/pf.c
index 9832e47..8708417 100644
--- sys/net/pf.c
+++ sys/net/pf.c
@@ -2666,11 +2666,11 @@ pf_step_into_anchor(int *depth, struct pf_ruleset **rs,
if (*depth = sizeof(pf_anchor_stack) /
sizeof(pf_anchor_stack[0])) {
log(LOG_ERR, pf_step_into_anchor: stack overflow\n);
*r = TAILQ_NEXT(*r, entries);
return;
-   } else if (*depth == 0  a != NULL)
+   } else if (a != NULL)
*a = *r;
f = pf_anchor_stack + (*depth)++;
f-rs = *rs;
f-r = *r;
if ((*r)-anchor_wildcard) {
@@ -2711,10 +2711,12 @@ pf_step_out_of_anchor(int *depth, struct pf_ruleset 
**rs,
}
}
(*depth)--;
if (*depth == 0  a != NULL)
*a = NULL;
+   else if (a != NULL)
+   *a = f-r;
*rs = f-rs;
if (*match  *depth) {
*match = *depth;
if (f-r-quick)
quick = 1;



pf: once for match rules?

2014-07-22 Thread Mike Belopuhov
Hi,

Before I send a diff for pfctl to disable once on match rules,
I've decided to try and see how much work is it to make it actually
work.  Turns out that I need to extend pf_rule_item by 3 pointers
to track the match rule ruleset, anchor rule and the ruleset it
belongs to.

Here's what this means in practice.  Consider a ruleset:

 block drop all
 match out log proto tcp to port 22 once
 anchor foo all {
   match out log proto tcp to port 22 once
   anchor bar all {
 match out log proto tcp to port 22 once
 pass out quick proto tcp to port 22 once
   }
 }

Once we send a packet to port 22 the ruleset collapses to just:

 block drop all

Thoughts?

diff --git sys/net/pf.c sys/net/pf.c
index 9f0e2d6..5679a40 100644
--- sys/net/pf.c
+++ sys/net/pf.c
@@ -3279,15 +3279,16 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
PR_NOWAIT)) == NULL) {
REASON_SET(reason, PFRES_MEMORY);
goto cleanup;
}
ri-r = r;
+   ri-ar = a;
+   ri-rs = ruleset;
+   ri-ars = aruleset;
/* order is irrelevant */
SLIST_INSERT_HEAD(rules, ri, entry);
pf_rule_to_actions(r, act);
-   if (r-rule_flag  PFRULE_AFTO)
-   pd-naf = r-naf;
if (pf_get_transaddr(r, pd, sns, nr) == -1) {
REASON_SET(reason, PFRES_TRANSLATE);
goto cleanup;
}
if (r-log || act.log  PF_LOG_MATCHES) {
@@ -3428,10 +3429,12 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
virtual_type, icmp_dir);
}
} else {
while ((ri = SLIST_FIRST(rules))) {
SLIST_REMOVE_HEAD(rules, entry);
+   if (ri-r-rule_flag  PFRULE_ONCE)
+   pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar);
pool_put(pf_rule_item_pl, ri);
}
}
 
/* copy back packet headers if needed */
@@ -3454,10 +3457,23 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
}
 #endif
 
if (r-rule_flag  PFRULE_ONCE)
pf_purge_rule(ruleset, r, aruleset, a);
+   if (*sm) {
+   SLIST_FOREACH(ri, (*sm)-match_rules, entry) {
+   if (ri-r-rule_flag  PFRULE_ONCE)
+   /*
+* We can be sure that pf_purge_rule won't
+* pool_put the rule because when *sm != NULL
+* STATE_INC_COUNTERS has increased states_cur.
+* pf_rule_item's and rules will be g/c'ed by
+* pf_free_state.
+*/
+   pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar);
+   }
+   }
 
 #if INET  INET6
if (rewrite  skw-af != sks-af)
return (PF_AFRT);
 #endif /* INET  INET6 */
diff --git sys/net/pfvar.h sys/net/pfvar.h
index a0d94f7..49af7b4 100644
--- sys/net/pfvar.h
+++ sys/net/pfvar.h
@@ -691,10 +691,13 @@ struct pf_threshold {
 };
 
 struct pf_rule_item {
SLIST_ENTRY(pf_rule_item)entry;
struct pf_rule  *r;
+   struct pf_rule  *ar;
+   struct pf_ruleset   *rs;
+   struct pf_ruleset   *ars;
 };
 
 SLIST_HEAD(pf_rule_slist, pf_rule_item);
 
 enum pf_sn_types { PF_SN_NONE, PF_SN_NAT, PF_SN_RDR, PF_SN_ROUTE, PF_SN_MAX };



Re: [PATCH] rdomain support on rc.d

2014-07-11 Thread Mike Belopuhov
On 11 July 2014 10:29, Antoine Jacoutot ajacou...@bsdfrog.org wrote:
 On Thu, Jul 10, 2014 at 06:51:01PM +0200, Loďc BLOT wrote:
 Hello all,
 I use rdomains to split routing domains per company and also separate
 administration interfaces from routing interfaces on my routers (sshd,
 bacula, postfix and puppetd running on a dedicated rdomain)

 Actually there is a problem with rdomains, we need to modify /etc/rc.d
 scripts to add rdomain execution environment to the specified service.
 If rc.subr have support to rdomains, we can let the rc.d scripts clean.

 To resolve those rdomain issues, I created a patch and I added a new
 variable we could use on rc.conf(.local), ${_name}_rdomain. (This
 variable needs a signed integer and use an existing rdomain, this is
 checked by rc.subr.

 I want to contribute to OpenBSD and I give you this patch. If you have
 any suggestions to improve it, tell me.

 I don't use rdomain so someone knowledgeable should comment here.
 But it does look like a nice idea.


having something like this would be really cool.  in case you'll be
tweaking the code, make sure that the route -T exec printf check
is preserved.  i would use true in this test however.

as far as i can tell the daemon_rdomain bit that goes into the rc
script is fine, however i'm not quite sure how can i start two
daemons in different rdomains via rc.conf.local.  looks like this
diff doesn't handle this and allows only one instance in the
${_name}_rdomain rdomain.  but sometimes you want multiple, say
sshd in rdomain 0 and 1.  daemon_rdomain flag allows me to go and
create another rc.d/sshd-rdomain-1 script and stuff daemon_rdomain=1
in there.  but then i'd have to add it to the pkg_scripts...  this
is a minor issue that i see.  perhaps ${_name}_rdomain should list
multiple values, like sshd_rdomain=0,1,2,3.



Re: PF Once rules are not removed from main anchor

2014-07-02 Thread Mike Belopuhov
On 21 June 2014 15:36, Alexandr Nedvedicky alexandr.nedvedi...@oracle.com 
wrote:
 Hello,

 I'm not sure it is the right place to submit patches. Let me know if there is
 better/more appropriate address for this.

 during our testing we've found the once rules are not removed,
 when used in main anchor.


Correct.  I've addedd the ruleset pointer check to prevent that
shortly before the release since it caused panics...

 during debugging we found the rules in main anchor have member anchor set to
 NULL (pf_rule::anchor). This makes pf_purge_rule() function to bail out
 to early without removing the rule from ruleset.

 patch below fixed problem for us.


However, this solution is not correct for us.  Perhaps you have some
other changes in your tree to make it work.

 Index: pf_ioctl.c
 ===
 RCS file: /cvs/src/sys/net/pf_ioctl.c,v
 retrieving revision 1.272
 diff -u -r1.272 pf_ioctl.c
 --- pf_ioctl.c  22 Apr 2014 14:41:03 -  1.272
 +++ pf_ioctl.c  20 Jun 2014 14:26:22 -
 @@ -312,7 +312,7 @@
  {
 u_int32_tnr;

 -   if (ruleset == NULL || ruleset-anchor == NULL)
 +   if (ruleset == NULL)
 return;


ruleset-anchor is useless since nothing really checks for it if
ruleset is NULL.

 pf_rm_rule(ruleset-rules.active.ptr, rule);
 @@ -325,7 +325,10 @@
 ruleset-rules.active.ticket++;

 pf_calc_skip_steps(ruleset-rules.active.ptr);
 -   pf_remove_if_empty_ruleset(ruleset);
 +
 +   if (ruleset != pf_main_ruleset) {
 +   pf_remove_if_empty_ruleset(ruleset);
 +   }
  }


You don't need to check ruleset against pf_main_ruleset since
the first thing pf_remove_if_empty_ruleset does is bail if
ruleset == pf_main_ruleset.  This bit confused me quite a bit.

What really is a problem for us is that when pf_purge_rule is
called on a main ruleset from pf_test_rule the first argument
(ruleset) is NULL and not pf_main_ruleset, which would be
sensible.  The only other user of it, AFAICT, is pflog_packet
but it checks for ruleset-anchor before it does it's thing.
I don't see any reason why we can't start our rule set iteration
with 'ruleset' set to pf_main_ruleset (regress tests agree).

OK?

diff --git sys/net/pf.c sys/net/pf.c
index 71f85d1..562901d 100644
--- sys/net/pf.c
+++ sys/net/pf.c
@@ -3163,10 +3163,11 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
}
break;
 #endif /* INET6 */
}
 
+   ruleset = pf_main_ruleset;
r = TAILQ_FIRST(pf_main_ruleset.rules.active.ptr);
while (r != NULL) {
r-evaluations++;
PF_TEST_ATTRIB((pfi_kif_match(r-kif, pd-kif) == r-ifnot),
r-skip[PF_SKIP_IFP].ptr);
diff --git sys/net/pf_ioctl.c sys/net/pf_ioctl.c
index 5ad762c..86e987f 100644
--- sys/net/pf_ioctl.c
+++ sys/net/pf_ioctl.c
@@ -308,24 +308,19 @@ pf_rm_rule(struct pf_rulequeue *rulequeue, struct pf_rule 
*rule)
 }
 
 void
 pf_purge_rule(struct pf_ruleset *ruleset, struct pf_rule *rule)
 {
-   u_int32_tnr;
+   u_int32_tnr = 0;
 
-   if (ruleset == NULL || ruleset-anchor == NULL)
-   return;
+   KASSERT(ruleset != NULL  rule != NULL);
 
pf_rm_rule(ruleset-rules.active.ptr, rule);
ruleset-rules.active.rcount--;
-
-   nr = 0;
TAILQ_FOREACH(rule, ruleset-rules.active.ptr, entries)
rule-nr = nr++;
-
ruleset-rules.active.ticket++;
-
pf_calc_skip_steps(ruleset-rules.active.ptr);
pf_remove_if_empty_ruleset(ruleset);
 }
 
 u_int16_t



Re: ANONCVS MIRROR MAINTAINERS.. YOU NEED TO READ THIS!

2014-06-26 Thread Mike Belopuhov
On 26 June 2014 08:53, patrick keshishian sids...@boxsoft.com wrote:
 On Wed, Jun 25, 2014 at 10:01:06PM -0700, patrick keshishian wrote:
 On Thu, Jun 26, 2014 at 06:37:00AM +0200, Alexander Hall wrote:
  On 06/25/14 20:52, Bob Beck wrote:
  If you or someone you love runs an anoncvs server, they need to see this.
  
  As you know we recently added commitid support to cvs, and we had
  you update your cvsync binary.
  
  Unfortunately, the fix wasn't quite right.  We ran into problems
  with the synching of commitid files. naddy managed to cook
  a correct fix for us.
  
  anoncvs.ca (the fanout machine) has been fixed - again. What you
  need to do is to update your cvsync to the latest one that was
  just committed into ports (cvsync-0.25.0pre0 or later). Remove your
  scanfile (if any).
  
  Once you do that, re-run cvsync and you should be back in business.
 
  Is it known whether this requires a fixed upstream cvsync first, or if
  it is ok to apply the fix and your repo will become fine whenever the
  upstream is fixed?
 
  Anyone with insight, please feel free to chime in.


 The commit message suggests, at least, the server must be
 updated:

 | Add full support for commitid and bump protocol version
 | Old clients will receive updates with commitid stripped out.


 It seems, running updated cvsync against an un-updated
 cvsyncd (I assume) results in failure:

 Connecting to anoncvs1.usa.openbsd.org port 
 Connected to 149.20.54.217 port 
 Running...
 Updating (collection openbsd-src/rcs)
 Updater(RCS): UPDATE(delta): error
 Updater(RCS): UPDATE: 
 /cvs/anoncvs/cvs/obsd/src/lib/libc/string/timingsafe_bcmp.3,v
 Updater: RCS Error
 Socket Error: recv: 2 residue 2
 Receiver Error: recv
 Mux(SEND) Error: not running: 0
 DirScan: RCS Error
 Mux(SEND) Error: not running: 1
 FileScan(RCS): UPDATE /cvs/anoncvs/cvs/obsd/src/sbin/pfctl/parse.y,v
 FileScan: RCS Error
 Failed



removing offending files locally and restarting cvsync helps.



pfctl: better af-to translation specs handling

2014-06-25 Thread Mike Belopuhov
Hi,

collapse_redirspec is one of my pet peeve since the af-to support.
Unfortunately we didn't put much effort in making it work well back
then, but now it is time for a change!  Improving upon the last
diff here's a collapsed version of the collapse_redirspec, so to
speak.  Instead of having two loops: first counts addresses, the
second one performs an action, here's a merged version that has all
the checks done once and in the right order.

No changes in the regress test output (all tests pass) and the
gain is that we no longer need to explicitely set the address
family for af-to rules, it will be deduced in most of the cases.
For example, previously you'd have to do:

  pass in inet af-to inet6 from ::1

While it's clear that you're translating IPv4 to the IPv6.  Now
it's possible to omit the 'inet' part:

  pass in af-to inet6 from ::1

Which is in line with how nat-to and rdr-to target addresses
affect selection of the address family for the whole rule, i.e.:

  % echo 'pass in rdr-to ::1' | pfctl -vnf -
  pass in inet6 all flags S/SA rdr-to ::1

(Of course af-to rule still needs the 'inet6' part, but that
will require some other changes as well...)

OK?

P.S. This diff moves chunks around quite a bit and might require
applying to tell the difference.

diff --git sbin/pfctl/parse.y sbin/pfctl/parse.y
index ce80f8e..fabe210 100644
--- sbin/pfctl/parse.y
+++ sbin/pfctl/parse.y
@@ -2025,26 +2025,21 @@ filter_opt  : USER uids {
filter_opts.nat.rdr = $2;
memcpy(filter_opts.nat.pool_opts, $3,
sizeof(filter_opts.nat.pool_opts));
}
| AFTO af FROM redirpool pool_opts {
if (filter_opts.nat.rdr) {
yyerror(cannot respecify af-to);
YYERROR;
}
if ($2 == 0) {
-   yyerror(no address family specified);
-   YYERROR;
-   }
-   if ($4-host-af  $4-host-af != $2) {
-   yyerror(af-to addresses must be in the 
-   target address family);
+   yyerror(no target address family specified);
YYERROR;
}
filter_opts.nat.af = $2;
filter_opts.nat.rdr = $4;
memcpy(filter_opts.nat.pool_opts, $5,
sizeof(filter_opts.nat.pool_opts));
filter_opts.rdr.rdr =
calloc(1, sizeof(struct redirection));
bzero(filter_opts.rdr.pool_opts,
sizeof(filter_opts.rdr.pool_opts));
@@ -3482,21 +3477,21 @@ redirspec   : host optweight{
n-weight = $2;
}
$$ = $1;
}
| '{' optnl redir_host_list '}' { $$ = $3; }
;
 
 redir_host_list: host optweight optnl  {
if ($1-addr.type != PF_ADDR_ADDRMASK) {
free($1);
-   yyerror(only addresses can be listed for
+   yyerror(only addresses can be listed for 
redirection pools );
YYERROR;
}
if ($2  0) {
struct node_host*n;
for (n = $1; n != NULL; n = n-next)
n-weight = $2;
}
$$ = $1;
}
@@ -4288,118 +4283,139 @@ expand_queue(char *qname, struct node_if *interfaces, 
struct queue_opts *opts)
 
FREE_LIST(struct node_if, interfaces);
return (0);
 }
 
 int
 collapse_redirspec(struct pf_pool *rpool, struct pf_rule *r,
 struct redirspec *rs, u_int8_t allow_if)
 {
struct pf_opt_tbl *tbl = NULL;
-   struct node_host *h;
+   struct node_host *h, *hprev = NULL;
struct pf_rule_addr ra;
-   int af = 0, i = 0;
+   int af = 0, naddr = 0;
 
if (!rs || !rs-rdr || rs-rdr-host == NULL) {
rpool-addr.type = PF_ADDR_NONE;
return (0);
}
 
if (r-rule_flag  PFRULE_AFTO)
r-naf = rs-af;
 
-   /* count matching addresses */
for (h = rs-rdr-host; h != NULL; h = h-next) {
-   if (h-af) {
-   /* check that af-to target address has correct af */
-   if (rs-af  rs-af != h-af) {
-   yyerror(af mismatch in %s spec,
-   allow_if ? routing : translation);
-  

Re: pfctl: stricter redirect specs

2014-06-24 Thread Mike Belopuhov
On Tue, Jun 24, 2014 at 15:07 +0200, Mike Belopuhov wrote:
 I have carefully tested that and do not expect any unrelated
 fallout.  And for the reasons stated above I don't believe
 anyone is using this since it's largely error prone.
 

and a regress chunk that avoids using such combination.

note that this diff basically finishes previous attemts at
modifying tests to match present syntax.

diff --git regress/sbin/pfctl/pf48.in regress/sbin/pfctl/pf48.in
index 540711a..a0dd143 100644
--- regress/sbin/pfctl/pf48.in
+++ regress/sbin/pfctl/pf48.in
@@ -1,12 +1,12 @@
 table  regress  { 1.2.3.4 !5.6.7.8 10/8 lo0 }
 table regress.1 const { ::1 fe80::/64 }
 table regress.a { 1.2.3.4 !5.6.7.8 } { ::1 ::2 ::3 } file /dev/null const 
{ 4.3.2.1 }
-match out on lo0 from  regress.1 to regress.2 nat-to lo0:0
+match out on lo0 inet from  regress.1 to regress.2 nat-to lo0:0
 match out on !lo0 inet from !regress.1  to regress.2 nat-to lo0:0
 match in on lo0 inet6 from regress.1 to regress.2 rdr-to lo0:0
-match in on !lo0 from ! regress.1  to regress.2 rdr-to lo0:0
+match in on !lo0 inet6 from ! regress.1  to regress.2 rdr-to lo0:0
 match in from { regress.1 !regress.2 } to any
 match out from any to { !regress.1, regress.2 }
 pass in from regress to any
 pass out from any to regress 
 pass in from { regress.1 regress.2 } to any
diff --git regress/sbin/pfctl/pf48.loaded regress/sbin/pfctl/pf48.loaded
index 68b9854..cfdc8a2 100644
--- regress/sbin/pfctl/pf48.loaded
+++ regress/sbin/pfctl/pf48.loaded
@@ -1,7 +1,7 @@
-@0 match out on lo0 inet6 from regress.1:2 to regress.2:* nat-to ::1
-  [ Skip steps: d=2 p=end da=4 sp=end dp=end ]
+@0 match out on lo0 inet from regress.1:2 to regress.2:* nat-to 127.0.0.1
+  [ Skip steps: d=2 f=2 p=end da=4 sp=end dp=end ]
   [ queue: qname= qid=0 pqname= pqid=0 ]
   [ Evaluations: 0 Packets: 0 Bytes: 0   States: 0 
]
 @1 match out on ! lo0 inet from ! regress.1:2 to regress.2:* nat-to 
127.0.0.1
   [ Skip steps: p=end da=4 sp=end dp=end ]
   [ queue: qname= qid=0 pqname= pqid=0 ]
diff --git regress/sbin/pfctl/pf48.ok regress/sbin/pfctl/pf48.ok
index 5f6e6ab..aeccfcf 100644
--- regress/sbin/pfctl/pf48.ok
+++ regress/sbin/pfctl/pf48.ok
@@ -1,9 +1,9 @@
 table regress { 1.2.3.4 !5.6.7.8 10.0.0.0/8 ::1 fe80::1 127.0.0.1 }
 table regress.1 const { ::1 fe80::/64 }
 table regress.a const { 1.2.3.4 !5.6.7.8 ::1 ::2 ::3 } file /dev/null { 
4.3.2.1 }
-match out on lo0 inet6 from regress.1 to regress.2 nat-to ::1
+match out on lo0 inet from regress.1 to regress.2 nat-to 127.0.0.1
 match out on ! lo0 inet from ! regress.1 to regress.2 nat-to 127.0.0.1
 match in on lo0 inet6 from regress.1 to regress.2 rdr-to ::1
 match in on ! lo0 inet6 from ! regress.1 to regress.2 rdr-to ::1
 match in from regress.1 to any
 match in from ! regress.2 to any
diff --git regress/sbin/pfctl/pf48.optimized regress/sbin/pfctl/pf48.optimized
index 63309a7..90e2036 100644
--- regress/sbin/pfctl/pf48.optimized
+++ regress/sbin/pfctl/pf48.optimized
@@ -1,7 +1,7 @@
-@0 match out on lo0 inet6 from regress.1:2 to regress.2:* nat-to ::1
-  [ Skip steps: d=2 p=end da=4 sp=end dp=end ]
+@0 match out on lo0 inet from regress.1:2 to regress.2:* nat-to 127.0.0.1
+  [ Skip steps: d=2 f=2 p=end da=4 sp=end dp=end ]
   [ queue: qname= qid=0 pqname= pqid=0 ]
   [ Evaluations: 0 Packets: 0 Bytes: 0   States: 0 
]
 @1 match out on ! lo0 inet from ! regress.1:2 to regress.2:* nat-to 
127.0.0.1
   [ Skip steps: p=end da=4 sp=end dp=end ]
   [ queue: qname= qid=0 pqname= pqid=0 ]
diff --git regress/sbin/pfctl/pf98.in regress/sbin/pfctl/pf98.in
index a18a491..a224f9e 100644
--- regress/sbin/pfctl/pf98.in
+++ regress/sbin/pfctl/pf98.in
@@ -1,4 +1,4 @@
 # Test rule order processing should pass (require-order no longer required)
 pass in on lo100 all
-match out on lo0 all nat-to lo0
+match out on lo0 inet6 all nat-to lo0
 



Re: pf anchor references

2014-06-05 Thread Mike Belopuhov
On Mon, Jun 02, 2014 at 17:51 +0200, Mike Belopuhov wrote:
 Hi,
 
 I've been chasing some bugs in the pfctl anchor code for a couple
 of weeks and I'm not astonished at how loose the handling is in
 general.  Lot's of rules and checks are being violated by some
 code paths while honoured by others.  The problem I have is that
 it's truly a rabbit's hole: almost every bug I hit is a feature
 in disguise.
 
 One thing that I particularly hate is called an anchor reference.
 A good example of this is a ruleset like this:
 
 anchor foo {
 block all
 }
 anchor foo
 
 If you believe it should be a syntax error let me disappoint you.
 The second 'anchor foo' is just a reference to the anchor named
 foo defined before:
 
 # echo -e 'anchor foo {\n block all\n}\nanchor foo' | pfctl -f -
 # pfctl -a '*' -sr
 anchor foo all {
   block drop all
 }
 anchor foo all {
   block drop all
 }
 # echo -e 'pass all' | pfctl -a 'foo' -f -
 # pfctl -a '*' -sr
 anchor foo all {
   pass all flags S/SA
 }
 anchor foo all {
   pass all flags S/SA
 }
 
 And to demonstrate a reference specified by an absolute path we can
 add anchor /foo inside foo:
 
 # echo -e 'anchor /foo' | pfctl -a 'foo' -f -
 
 This obviously gets us an endless loop if we decide to print out
 contents recursively (pfctl -a '*' -sr).  Thankfully this doesn't
 affect ruleset evaluation in the kernel.
 
 The basic question I have is why do we need this dangerous and (at
 least in my eyes) pretty useless mechanism?
 
 Any opinions on this?  Does anyone make any sensible use of it?
 Should we try to simplify this by removing ambiguous functionality?
 
 Cheers,
 Mike

While I've got a few responses on tech supporting me, I should
probably ask misc@ as well for additional scrutiny.  And besides,
I've come up with an example that might simplify ruleset creation
for say multiple customer vlans or rdomains.  Does anyone use it
like that?  This looks like an anchor template that we want to
specify but not use directly in the main ruleset, but it plays
somewhat nicely with quick anchors.

(The ruleset below was tested and works correctly)

Cheers,
Mike


block all

anchor customer1 quick on rdomain 1 {
anchor /allow-egress
anchor /allow-ssh
anchor /allow-icmp-echo
}

anchor customer2 quick on rdomain 2 {
anchor /allow-egress
anchor /allow-ssh
}

pass in quick on rdomain 0 proto tcp to (self) port ssh
pass out quick on rdomain 0

# end of ruleset evaluation
block quick

anchor allow-ssh {
pass in proto tcp to (self) port ssh
}

anchor allow-icmp-echo {
pass in inet proto icmp to (self) icmp-type echoreq code 0
}

anchor allow-egress {
pass out
}



Re: Remove a global variable in ip_input

2014-06-04 Thread Mike Belopuhov
On 4 June 2014 12:30, Martin Pieuchot mpieuc...@nolizard.org wrote:
 ok?


sure



Re: in_pcbbind() and in_broadcast/in_iawithaddr

2014-06-04 Thread Mike Belopuhov
On 3 June 2014 09:18, Martin Pieuchot mpieuc...@nolizard.org wrote:
 On 02/06/14(Mon) 15:45, Martin Pieuchot wrote:
 This diff is similar to the one that has been committed to handle the
 SOCK_RAW binding.  I'd like to stop using in_iawithaddr() *and*
 in_broadcast().  Since these functions are just doing an iteration on
 all the addresses present in the RB-tree (or equivalent), let's use
 ifa_ifwithaddr() instead.

 This diff should not introduce any behavior change concerning SOCK_DGRAM
 and binding to multicast addresses.

 As pointed out by jca@ this diff breaks on loopback.  This is because
 the loopback IPv4 addresses are abusing the dstaddr field to be able to
 create a route to their address.  Hopefully this hack can be removed
 once the local route diff is in, but in the meantime let's use the less
 intuitive but equivalent idiom:

 sin-sin_addr.s_addr != ia-ia_addr.s_addr

 Update diff below, is this one ok?


OK



Re: PRU_BIND in raw ip

2014-06-02 Thread Mike Belopuhov
On 28 May 2014 13:36, Martin Pieuchot mpieuc...@nolizard.org wrote:
 On 28/05/14(Wed) 09:30, Jérémie Courrèges-Anglas wrote:
 Martin Pieuchot mpieuc...@nolizard.org writes:

  Diff below replace in_iawithaddr() + in_broadcast() - ifa_ifwithaddr(),
  that does the same for IPv4 since broadcast addresses are added to the
  tree.

 This prevents listeners to bind on 255.255.255.255, something allowed
 with the current code.  Thoughts?

 You're right, here's an updated diff that keeps this behavior.


OK



Re: Remove p2p loopback hack in nd6_rtrequest()

2014-05-13 Thread Mike Belopuhov
On Mon, May 12, 2014 at 12:48 +0200, Martin Pieuchot wrote:
 On 07/05/14(Wed) 12:46, Martin Pieuchot wrote:
  Diff below stops abusing nd6_rtrequest() for loopback interfaces, which
  means we can remove the special hack below and reduce the differences
  with arp_rtrequest().
  
  This diff introduces two changes in the inet6 routing table, but they
  should not matter.  The first one is that the gateway of the link-local
  entry for loopback interfaces is no longer a buggy link-layer address:
  
  -fe80::1%lo0link#6 UHL  
  00 - 4 lo0  
  +fe80::1%lo0fe80::1%lo0UHL  
  00 - 4 lo0
  
  The second one is that every route to network associated to the
  loopback interface will now have the mtu of this interface:
  
  -ff02::/16  ::1UGRS 
  00 - 8 lo0  
  +ff02::/16  ::1UGRS 
  00 33192 8 lo0  
  
  Both changes are consistent with the IPv4 behavior, ok?
 
 Anyone?
 

OK

  
  
  Index: netinet6/in6.c
  ===
  RCS file: /home/ncvs/src/sys/netinet6/in6.c,v
  retrieving revision 1.136
  diff -u -p -r1.136 in6.c
  --- netinet6/in6.c  5 May 2014 11:44:33 -   1.136
  +++ netinet6/in6.c  7 May 2014 10:29:24 -
  @@ -1399,7 +1399,7 @@ in6_ifinit(struct ifnet *ifp, struct in6
  /* Add ownaddr as loopback rtentry, if necessary (ex. on p2p link). */
  if (newhost) {
  /* set the rtrequest function to create llinfo */
  -   if ((ifp-if_flags  IFF_POINTOPOINT) == 0)
  +   if ((ifp-if_flags  (IFF_LOOPBACK | IFF_POINTOPOINT)) == 0)
  ia6-ia_ifa.ifa_rtrequest = nd6_rtrequest;
   
  rt_ifa_addloop((ia6-ia_ifa));
  Index: netinet6/nd6.c
  ===
  RCS file: /home/ncvs/src/sys/netinet6/nd6.c,v
  retrieving revision 1.116
  diff -u -p -r1.116 nd6.c
  --- netinet6/nd6.c  7 May 2014 08:14:59 -   1.116
  +++ netinet6/nd6.c  7 May 2014 10:29:24 -
  @@ -1059,20 +1059,14 @@ nd6_rtrequest(int req, struct rtentry *r
   #endif
  /* FALLTHROUGH */
  case RTM_RESOLVE:
  -   if ((ifp-if_flags  (IFF_POINTOPOINT | IFF_LOOPBACK)) == 0) {
  -   /*
  -* Address resolution isn't necessary for a point to
  -* point link, so we can skip this test for a p2p link.
  -*/
  -   if (gate-sa_family != AF_LINK ||
  -   gate-sa_len  sizeof(null_sdl)) {
  -   log(LOG_DEBUG, %s: bad gateway value: %s\n,
  -   __func__, ifp-if_xname);
  -   break;
  -   }
  -   SDL(gate)-sdl_type = ifp-if_type;
  -   SDL(gate)-sdl_index = ifp-if_index;
  +   if (gate-sa_family != AF_LINK ||
  +   gate-sa_len  sizeof(null_sdl)) {
  +   log(LOG_DEBUG, %s: bad gateway value: %s\n,
  +   __func__, ifp-if_xname);
  +   break;
  }
  +   SDL(gate)-sdl_type = ifp-if_type;
  +   SDL(gate)-sdl_index = ifp-if_index;
  if (ln != NULL)
  break;  /* This happens on a route change */
  /*
  
 



Re: snmpd: add backend from bgpd to support multiple routing tables

2014-05-13 Thread Mike Belopuhov
On Mon, Apr 28, 2014 at 14:20 +0200, Mike Belopuhov wrote:
 This adds ktable code from bgpd to fetch, store and perform lookups
 in multiple routing tables.  Currently it doesn't do anything useful
 but it's a prerequisite for any future work in this direction.
 
 OK to get this in?
 

Any objections?

 diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c
 index e157b25..e19f924 100644
 --- usr.sbin/snmpd/kroute.c
 +++ usr.sbin/snmpd/kroute.c
 @@ -45,10 +45,13 @@
  
  #include snmpd.h
  
  extern struct snmpd  *env;
  
 +struct ktable**krt;
 +u_int  krt_size;
 +
  struct {
   struct event ks_ev;
   u_long   ks_iflastchange;
   u_long   ks_nroutes;/* 4 billions enough? */
   int  ks_fd;
 @@ -77,24 +80,32 @@ struct kif_node {
  
  int  kroute_compare(struct kroute_node *, struct kroute_node *);
  int  kroute6_compare(struct kroute6_node *, struct kroute6_node *);
  int  kif_compare(struct kif_node *, struct kif_node *);
  
 -struct kroute_node   *kroute_find(in_addr_t, u_int8_t, u_int8_t);
 +void  ktable_init(void);
 +int   ktable_new(u_int, u_int);
 +void  ktable_free(u_int);
 +int   ktable_exists(u_int, u_int *);
 +struct ktable*ktable_get(u_int);
 +int   ktable_update(u_int);
 +
 +struct kroute_node   *kroute_find(struct ktable *, in_addr_t, u_int8_t,
 + u_int8_t);
  struct kroute_node   *kroute_matchgw(struct kroute_node *,
   struct sockaddr_in *);
 -int   kroute_insert(struct kroute_node *);
 -int   kroute_remove(struct kroute_node *);
 -void  kroute_clear(void);
 +int   kroute_insert(struct ktable *, struct kroute_node *);
 +int   kroute_remove(struct ktable *, struct kroute_node *);
 +void  kroute_clear(struct ktable *);
  
 -struct kroute6_node  *kroute6_find(const struct in6_addr *, u_int8_t,
 -  u_int8_t);
 +struct kroute6_node  *kroute6_find(struct ktable *, const struct in6_addr *,
 + u_int8_t, u_int8_t);
  struct kroute6_node  *kroute6_matchgw(struct kroute6_node *,
   struct sockaddr_in6 *);
 -int   kroute6_insert(struct kroute6_node *);
 -int   kroute6_remove(struct kroute6_node *);
 -void  kroute6_clear(void);
 +int   kroute6_insert(struct ktable *, struct kroute6_node *);
 +int   kroute6_remove(struct ktable *, struct kroute6_node *);
 +void  kroute6_clear(struct ktable *);
  
  struct kif_arp   *karp_find(struct sockaddr *, u_short);
  int   karp_insert(struct kif_node *, struct kif_arp *);
  int   karp_remove(struct kif_node *, struct kif_arp *);
  
 @@ -121,23 +132,21 @@ voidif_newaddr(u_short, struct sockaddr *, 
 struct sockaddr *,
   struct sockaddr *);
  void if_deladdr(u_short, struct sockaddr *, struct sockaddr *,
   struct sockaddr *);
  void if_announce(void *);
  
 -int  fetchtable(void);
 +int  fetchtable(struct ktable *);
  int  fetchifs(u_short);
 -int  fetcharp(void);
 +int  fetcharp(struct ktable *);
  void dispatch_rtmsg(int, short, void *);
  int  rtmsg_process(char *, int);
 -int  dispatch_rtmsg_addr(struct rt_msghdr *,
 +int  dispatch_rtmsg_addr(struct ktable *, struct rt_msghdr *,
   struct sockaddr *[RTAX_MAX]);
  
 -RB_HEAD(kroute_tree, kroute_node)krt;
  RB_PROTOTYPE(kroute_tree, kroute_node, entry, kroute_compare)
  RB_GENERATE(kroute_tree, kroute_node, entry, kroute_compare)
  
 -RB_HEAD(kroute6_tree, kroute6_node)  krt6;
  RB_PROTOTYPE(kroute6_tree, kroute6_node, entry, kroute6_compare)
  RB_GENERATE(kroute6_tree, kroute6_node, entry, kroute6_compare)
  
  RB_HEAD(kif_tree, kif_node)  kit;
  RB_PROTOTYPE(kif_tree, kif_node, entry, kif_compare)
 @@ -149,10 +158,11 @@ RB_GENERATE(ka_tree, kif_addr, node, ka_compare)
  
  void
  kr_init(void)
  {
   int opt = 0, rcvbuf, default_rcvbuf;
 + unsigned inttid = RTABLE_ANY;
   socklen_t   optlen;
  
   if ((kr_state.ks_ifd = socket(AF_INET, SOCK_DGRAM, 0)) == -1)
   fatal(kr_init: ioctl socket);
  
 @@ -179,31 +189,166 @@ kr_init(void)
   setsockopt(kr_state.ks_fd, SOL_SOCKET, SO_RCVBUF,
   rcvbuf, sizeof(rcvbuf)) == -1  errno == ENOBUFS;
   rcvbuf /= 2)
   ;   /* nothing */
  
 - RB_INIT(krt);
 - RB_INIT(krt6);
 + if (setsockopt(kr_state.ks_fd, AF_ROUTE, ROUTE_TABLEFILTER, tid,
 + sizeof(tid)) == -1)
 + log_warn(kr_init: setsockopt AF_ROUTE

Re: MI MTU size for lo(4)

2014-05-13 Thread Mike Belopuhov
On 13 May 2014 15:45, Claudio Jeker cje...@diehard.n-r-g.com wrote:
 With KAME the MTU size of the loopback interface became strange and is
 actually dependend on the architecture. I see no point in all this
 just go back to the way it was long long long ago and just use 32k as the
 MTU. AFAIK all of this was only done to test large IPv6 packets but why
 MHLEN and MLEN were added is still beyond my imagination.


In fact this comes from this commit in NetBSD originally:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/if_loop.c#rev1.20

Comment says Add MHLEN + MLEN extra space to LOMTU for IP and
transport headers.  This doesn't make me understand why this
is needed however.  So unless this clues you in, I'm OK with
setting it to 32k everywhere.



Re: MI MTU size for lo(4)

2014-05-13 Thread Mike Belopuhov
On 13 May 2014 16:05, Mike Belopuhov m...@belopuhov.com wrote:
 On 13 May 2014 15:45, Claudio Jeker cje...@diehard.n-r-g.com wrote:
 With KAME the MTU size of the loopback interface became strange and is
 actually dependend on the architecture. I see no point in all this
 just go back to the way it was long long long ago and just use 32k as the
 MTU. AFAIK all of this was only done to test large IPv6 packets but why
 MHLEN and MLEN were added is still beyond my imagination.


 In fact this comes from this commit in NetBSD originally:
 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/if_loop.c#rev1.20

 Comment says Add MHLEN + MLEN extra space to LOMTU for IP and
 transport headers.  This doesn't make me understand why this
 is needed however.  So unless this clues you in, I'm OK with
 setting it to 32k everywhere.

And purely for your amusement: FreeBSD have decreased MTU from
64k to 16k in '95:

http://svnweb.freebsd.org/base/head/sys/net/if_loop.c?r1=6875r2=6876

Perhaps (and I'm somewhat speculating here) because of TCP
performance problems, like so:

http://marc.info/?l=netbsd-current-usersm=102684186303106w=2
(fixed diff is in the next mail though)



Re: m-m_pkthdr.rcvif and ip6_input()

2014-05-12 Thread Mike Belopuhov
On 12 May 2014 15:12, Martin Pieuchot mpieuc...@nolizard.org wrote:
 Like the previous diffs, it reduces the number of m-m_pkthdr.rcvif
 occurrences, this time in ip6_input().  Should be no functional change.

 Ok?


OK



Re: Annoying emacs variable in if_spppsubr.c

2014-05-02 Thread Mike Belopuhov
On 2 May 2014 12:09, Jérémie Courrèges-Anglas j...@wxcvbn.org wrote:

 This one is bugging me each time I start my Emacs session (because Emacs
 now asks confirmation for most variables).  This one would be useful
 only with hilit19.el (obsolete) from editors/emacs21... if the size of
 the file was actually under 12 bytes (it's not anymore).

 ok to kill it?


yes, please.



Re: data modified on freelist, tmpfs-related?

2014-04-30 Thread Mike Belopuhov
On 30 April 2014 15:55, Mark Kettenis mark.kette...@xs4all.nl wrote:
 Date: Wed, 30 Apr 2014 15:38:39 +0200 (CEST)
 From: Mark Kettenis mark.kette...@xs4all.nl

  Date: Wed, 30 Apr 2014 13:39:20 +0100
  From: Stuart Henderson st...@openbsd.org
 
  Seen when running e2fsprogs regression tests with /tmp on tmpfs

 I'm not surprised; tmpfs contains some serious bugs.  I recommend not
 using it until those are fixed.

 Which means, I'd like somebody else besides espie@ to comment on my
 uvm_aobj.c list manipulation hack diff.


Diff made sense to me when I looked at it, but I would rather hide
direct pointer access :/  Perhaps LIST_SWAP does a tiny bit more,
but it's cleaner and perhaps can be useful in the future.



Re: data modified on freelist, tmpfs-related?

2014-04-30 Thread Mike Belopuhov
On 30 April 2014 16:55, Mark Kettenis mark.kette...@xs4all.nl wrote:
 From: Mike Belopuhov m...@belopuhov.com
 Date: Wed, 30 Apr 2014 16:00:45 +0200

 On 30 April 2014 15:55, Mark Kettenis mark.kette...@xs4all.nl wrote:
  Date: Wed, 30 Apr 2014 15:38:39 +0200 (CEST)
  From: Mark Kettenis mark.kette...@xs4all.nl
 
   Date: Wed, 30 Apr 2014 13:39:20 +0100
   From: Stuart Henderson st...@openbsd.org
  
   Seen when running e2fsprogs regression tests with /tmp on tmpfs
 
  I'm not surprised; tmpfs contains some serious bugs.  I recommend not
  using it until those are fixed.
 
  Which means, I'd like somebody else besides espie@ to comment on my
  uvm_aobj.c list manipulation hack diff.
 

 Diff made sense to me when I looked at it, but I would rather hide
 direct pointer access :/  Perhaps LIST_SWAP does a tiny bit more,
 but it's cleaner and perhaps can be useful in the future.

 I'm not comfortable with introducing more sys/queue.h APIs.  So
 perhaps we should just punt on the optimization and remove/insert all
 list items.  Removing the trap comments that pedro set up...


I agree.



snmpd: add backend from bgpd to support multiple routing tables

2014-04-28 Thread Mike Belopuhov
This adds ktable code from bgpd to fetch, store and perform lookups
in multiple routing tables.  Currently it doesn't do anything useful
but it's a prerequisite for any future work in this direction.

OK to get this in?

diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c
index e157b25..e19f924 100644
--- usr.sbin/snmpd/kroute.c
+++ usr.sbin/snmpd/kroute.c
@@ -45,10 +45,13 @@
 
 #include snmpd.h
 
 extern struct snmpd*env;
 
+struct ktable  **krt;
+u_intkrt_size;
+
 struct {
struct event ks_ev;
u_long   ks_iflastchange;
u_long   ks_nroutes;/* 4 billions enough? */
int  ks_fd;
@@ -77,24 +80,32 @@ struct kif_node {
 
 intkroute_compare(struct kroute_node *, struct kroute_node *);
 intkroute6_compare(struct kroute6_node *, struct kroute6_node *);
 intkif_compare(struct kif_node *, struct kif_node *);
 
-struct kroute_node *kroute_find(in_addr_t, u_int8_t, u_int8_t);
+voidktable_init(void);
+int ktable_new(u_int, u_int);
+voidktable_free(u_int);
+int ktable_exists(u_int, u_int *);
+struct ktable  *ktable_get(u_int);
+int ktable_update(u_int);
+
+struct kroute_node *kroute_find(struct ktable *, in_addr_t, u_int8_t,
+   u_int8_t);
 struct kroute_node *kroute_matchgw(struct kroute_node *,
struct sockaddr_in *);
-int kroute_insert(struct kroute_node *);
-int kroute_remove(struct kroute_node *);
-voidkroute_clear(void);
+int kroute_insert(struct ktable *, struct kroute_node *);
+int kroute_remove(struct ktable *, struct kroute_node *);
+voidkroute_clear(struct ktable *);
 
-struct kroute6_node*kroute6_find(const struct in6_addr *, u_int8_t,
-u_int8_t);
+struct kroute6_node*kroute6_find(struct ktable *, const struct in6_addr *,
+   u_int8_t, u_int8_t);
 struct kroute6_node*kroute6_matchgw(struct kroute6_node *,
struct sockaddr_in6 *);
-int kroute6_insert(struct kroute6_node *);
-int kroute6_remove(struct kroute6_node *);
-voidkroute6_clear(void);
+int kroute6_insert(struct ktable *, struct kroute6_node *);
+int kroute6_remove(struct ktable *, struct kroute6_node *);
+voidkroute6_clear(struct ktable *);
 
 struct kif_arp *karp_find(struct sockaddr *, u_short);
 int karp_insert(struct kif_node *, struct kif_arp *);
 int karp_remove(struct kif_node *, struct kif_arp *);
 
@@ -121,23 +132,21 @@ void  if_newaddr(u_short, struct sockaddr *, 
struct sockaddr *,
struct sockaddr *);
 void   if_deladdr(u_short, struct sockaddr *, struct sockaddr *,
struct sockaddr *);
 void   if_announce(void *);
 
-intfetchtable(void);
+intfetchtable(struct ktable *);
 intfetchifs(u_short);
-intfetcharp(void);
+intfetcharp(struct ktable *);
 void   dispatch_rtmsg(int, short, void *);
 intrtmsg_process(char *, int);
-intdispatch_rtmsg_addr(struct rt_msghdr *,
+intdispatch_rtmsg_addr(struct ktable *, struct rt_msghdr *,
struct sockaddr *[RTAX_MAX]);
 
-RB_HEAD(kroute_tree, kroute_node)  krt;
 RB_PROTOTYPE(kroute_tree, kroute_node, entry, kroute_compare)
 RB_GENERATE(kroute_tree, kroute_node, entry, kroute_compare)
 
-RB_HEAD(kroute6_tree, kroute6_node)krt6;
 RB_PROTOTYPE(kroute6_tree, kroute6_node, entry, kroute6_compare)
 RB_GENERATE(kroute6_tree, kroute6_node, entry, kroute6_compare)
 
 RB_HEAD(kif_tree, kif_node)kit;
 RB_PROTOTYPE(kif_tree, kif_node, entry, kif_compare)
@@ -149,10 +158,11 @@ RB_GENERATE(ka_tree, kif_addr, node, ka_compare)
 
 void
 kr_init(void)
 {
int opt = 0, rcvbuf, default_rcvbuf;
+   unsigned inttid = RTABLE_ANY;
socklen_t   optlen;
 
if ((kr_state.ks_ifd = socket(AF_INET, SOCK_DGRAM, 0)) == -1)
fatal(kr_init: ioctl socket);
 
@@ -179,31 +189,166 @@ kr_init(void)
setsockopt(kr_state.ks_fd, SOL_SOCKET, SO_RCVBUF,
rcvbuf, sizeof(rcvbuf)) == -1  errno == ENOBUFS;
rcvbuf /= 2)
;   /* nothing */
 
-   RB_INIT(krt);
-   RB_INIT(krt6);
+   if (setsockopt(kr_state.ks_fd, AF_ROUTE, ROUTE_TABLEFILTER, tid,
+   sizeof(tid)) == -1)
+   log_warn(kr_init: setsockopt AF_ROUTE ROUTE_TABLEFILTER);
+
RB_INIT(kit);
RB_INIT(kat);
 
if 

Re: iked + isakmpd on the same machine

2014-04-24 Thread Mike Belopuhov
On 24 April 2014 12:12, Philipp
e1c1bac6253dc54a1e89ddc046585...@posteo.net wrote:
 Am 22.04.2014 17:28 schrieb Mike Belopuhov:

 more like it's not supported and is not supposed to work.

 not supposed as in 'not wanted'?


not supposed.


 it's like running nginx and apache at the same time but

 Quite frankly: I'm doing that in some locations ;-)


not on the same port (80) though.  ikev2 and isakmp both use
same udp ports (500 and 4500).


 worse since there are kernel tentacles involved as well
 (as you might have figured out already) that will likely

 That's somehow the main problem, the two daemons are not
 trying to share the pfkey2 ioctls outcome.

i don't see it like that.

 So, I can wait til iked supports ikev1, too.

there are no current plans to implement ikev1 support that
i'm aware of.

 Using a different machine will be quite painful at the moment.
 Rock+hard place.


 prevent you from doing that on the same box but different
 ip addresses.

 Nevertheless I'd say that a Listen-on style directive for iked
 would a useful thing[tm], e.g. to default the srcid to that.


perhaps.  currently i believe srcid will default to local if
specified.

 Cheers.




Re: Kill in_localaddr()

2014-04-24 Thread Mike Belopuhov
On 24 April 2014 16:41, Martin Pieuchot mpieuc...@nolizard.org wrote:
 in_localaddr() is used only once in our tree and only if the sysctl
 net.inet.ip.mtudisc is set to 0.

 It is used to optimize the size of the MSS if the forward address
 correspond to a host on one of our subnets.  Since it's an
 optimization for a special case that's not enabled by default, I'd
 like to  kill it to remove one usage of the global list of IPv4
 addresses.

 While here get rid of the #ifdef RTV_MTU, it is here.

 ok?


OK



Re: iked + isakmpd on the same machine

2014-04-24 Thread Mike Belopuhov
On 24 April 2014 20:25, Chris Cappuccio ch...@nmedia.net wrote:
 Mike Belopuhov [m...@belopuhov.com] wrote:

 more like it's not supported and is not supposed to work.
 it's like running nginx and apache at the same time

 hey, nginx and httpd run concurrently quite fine on
 different IP addresses, same box :)

i meant using the same port numbers of course.



Re: iked + isakmpd on the same machine

2014-04-24 Thread Mike Belopuhov
On 24 April 2014 22:25, Alexander Hall alexan...@beard.se wrote:
 On 04/24/14 21:53, Stuart Henderson wrote:

 On 2014/04/24 20:30, Mike Belopuhov wrote:

 On 24 April 2014 20:25, Chris Cappuccio ch...@nmedia.net wrote:

 Mike Belopuhov [m...@belopuhov.com] wrote:


 more like it's not supported and is not supposed to work.
 it's like running nginx and apache at the same time


 hey, nginx and httpd run concurrently quite fine on
 different IP addresses, same box :)


 i meant using the same port numbers of course.


 they can do that fine too! :) just have one hand-off the relevant
 requests to the other.


 If they bind to separate IP addresses that is obviously not a problem, even
 for the same port numbers.

yes. that's precisely what i meant:  you can't bind to the same ipaddr:port
pair twice.   why do i have to chew it and spit it out for you.  it was clear
what i meant from the start.



Re: IPv6 DoS sysctl man page additions

2014-04-22 Thread Mike Belopuhov
On 19 April 2014 13:20, Loganaden Velvindron lo...@elandsys.com wrote:
 On Sat, Apr 19, 2014 at 04:04:30AM -0700, Loganaden Velvindron wrote:
 Hi All,

 I'm taking a short break from playing with pf statistics.

 There were 4 sysctls added from KAME, but the man pages weren't updated
 accordingly.

 (Adapted from the NetBSD man page changes)

 Feedback welcomed.



 Removed trailing spaces and use set to instead of set it to based
 on feedback from sthen@



OK mikeb



Re: iked + isakmpd on the same machine

2014-04-22 Thread Mike Belopuhov
On 22 April 2014 17:13, Philipp
e1c1bac6253dc54a1e89ddc046585...@posteo.net wrote:
 It happened! A remote peer *requires* IKEv2 - and I've to do that on a
 machine running isakmpd with somewhat 25+ IKEv1 peers.

 First hurdle: I cannot bind iked to a certain (carp) IP-address. Mad
 workaround: start isakmpd (with Listen-on) first.
 Second hurdle: iked loads its SAs and eventually does this by creating a
 new empty SADB, effectivly killing all
 the SAs isakmpd loaded into the kernel before?

 Is there a diff sleeping out there for tackling the first hurdle?

 For the second one, I've to refrain from testing this in live further more.
 First to reconstruct my Frankenstein-Lab.

 Cheers for any thoughts beside mad, bro? :-)


more like it's not supported and is not supposed to work.
it's like running nginx and apache at the same time but
worse since there are kernel tentacles involved as well
(as you might have figured out already) that will likely
prevent you from doing that on the same box but different
ip addresses.

cheers,
mike



Re: snmpd: support for ipNetToMediaTable (ARP table exporting)

2014-04-10 Thread Mike Belopuhov
On Mon, Apr 07, 2014 at 17:03 +0200, Mike Belopuhov wrote:
 a bit of an update, mainly style changes.  one functional change:
 don't rely on rtm_rmx.rmx_expire to set the F_STATIC flag as
 rt_getmetrics is not called consistenly (only with RTM_GETs) and
 besides RTF_STATIC flag is already present for static ARP entries.
 
 http://www.vantronix.net/~mike/snmpd-arp.diff

I've ditched rdomain kludges to simplify the diff and because actual
rdomain support doesn't need any of those.

OK?

diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c
index 1ed4d17..e157b25 100644
--- usr.sbin/snmpd/kroute.c
+++ usr.sbin/snmpd/kroute.c
@@ -69,10 +69,11 @@ struct kroute6_node {
 };
 
 struct kif_node {
RB_ENTRY(kif_node)   entry;
TAILQ_HEAD(, kif_addr)   addrs;
+   TAILQ_HEAD(, kif_arp)arps;
struct kif   k;
 };
 
 intkroute_compare(struct kroute_node *, struct kroute_node *);
 intkroute6_compare(struct kroute6_node *, struct kroute6_node *);
@@ -91,10 +92,14 @@ struct kroute6_node *kroute6_matchgw(struct kroute6_node *,
struct sockaddr_in6 *);
 int kroute6_insert(struct kroute6_node *);
 int kroute6_remove(struct kroute6_node *);
 voidkroute6_clear(void);
 
+struct kif_arp *karp_find(struct sockaddr *, u_short);
+int karp_insert(struct kif_node *, struct kif_arp *);
+int karp_remove(struct kif_node *, struct kif_arp *);
+
 struct kif_node*kif_find(u_short);
 struct kif_node*kif_insert(u_short);
 int kif_remove(struct kif_node *);
 voidkif_clear(void);
 struct kif *kif_update(u_short, int, struct if_data *,
@@ -118,10 +123,11 @@ void  if_deladdr(u_short, struct sockaddr *, 
struct sockaddr *,
struct sockaddr *);
 void   if_announce(void *);
 
 intfetchtable(void);
 intfetchifs(u_short);
+intfetcharp(void);
 void   dispatch_rtmsg(int, short, void *);
 intrtmsg_process(char *, int);
 intdispatch_rtmsg_addr(struct rt_msghdr *,
struct sockaddr *[RTAX_MAX]);
 
@@ -182,10 +188,12 @@ kr_init(void)
 
if (fetchifs(0) == -1)
fatalx(kr_init fetchifs);
if (fetchtable() == -1)
fatalx(kr_init fetchtable);
+   if (fetcharp() == -1)
+   fatalx(kr_init fetcharp);
 
event_set(kr_state.ks_ev, kr_state.ks_fd, EV_READ | EV_PERSIST,
dispatch_rtmsg, NULL);
event_add(kr_state.ks_ev, NULL);
 }
@@ -519,10 +527,123 @@ kroute6_clear(void)
 
while ((kr = RB_MIN(kroute6_tree, krt6)) != NULL)
kroute6_remove(kr);
 }
 
+static inline int
+karp_compare(struct kif_arp *a, struct kif_arp *b)
+{
+   /* Interface indices are assumed equal */
+   if (ntohl(a-addr.sin.sin_addr.s_addr) 
+   ntohl(b-addr.sin.sin_addr.s_addr))
+   return (1);
+   if (ntohl(a-addr.sin.sin_addr.s_addr) 
+   ntohl(b-addr.sin.sin_addr.s_addr))
+   return (-1);
+   return (0);
+}
+
+static inline struct kif_arp *
+karp_search(struct kif_node *kn, struct kif_arp *ka)
+{
+   struct kif_arp  *pivot;
+
+   TAILQ_FOREACH(pivot, kn-arps, entry) {
+   switch (karp_compare(ka, pivot)) {
+   case 0: /* found */
+   return (pivot);
+   case -1: /* ka  pivot, end the search */
+   return (NULL);
+   }
+   }
+   /* looped through the whole list and didn't find */
+   return (NULL);
+}
+
+struct kif_arp *
+karp_find(struct sockaddr *sa, u_short ifindex)
+{
+   struct kif_node *kn;
+   struct kif_arp  *ka = NULL, s;
+
+   memcpy(s.addr.sa, sa, sa-sa_len);
+
+   if (ifindex == 0) {
+   /*
+* We iterate manually to handle zero ifindex special
+* case differently from kif_find, in particular we
+* want to look for the address on all available
+* interfaces.
+*/
+   RB_FOREACH(kn, kif_tree, kit) {
+   if ((ka = karp_search(kn, s)) != NULL)
+   break;
+   }
+   } else {
+   if ((kn = kif_find(ifindex)) == NULL)
+   return (NULL);
+   ka = karp_search(kn, s);
+   }
+   return (ka);
+}
+
+int
+karp_insert(struct kif_node *kn, struct kif_arp *ka)
+{
+   struct kif_arp  *pivot;
+
+   if (ka-if_index == 0)
+   return (-1);
+   if (!kn  (kn = kif_find(ka-if_index)) == NULL)
+   return (-1);
+   /* Put entry on the list in the ascending lexical order */
+   TAILQ_FOREACH(pivot, kn-arps, entry) {
+   switch

<    1   2   3   4   5   6   7   8   9   10   >