Re: unveil(2) tcpdump(8)
Hi tech@ I've commited this, please test it as much as possible by applying the diff right now or just wait for the next snapshot. Let it run for a long time and let me know of any problems as soon as you get any. For this your ktrace, pcap and coredump files will be VERY important for further analysis of the issue. As noted in sysctl(8)'s man in order to generate the dumps in a safe place please do the following: # mkdir -m 700 /var/crash/tcpdump # sysctl kern.nosuidcoredump=3 If someone has weird networks then even better for testing it since mine is pretty much vanilla and might not have been enough to see flaws with the diff. I can't stress this enough, please test it and report it ASAP either here or in private. On 18:08 Wed 26 Sep , Ricardo Mestre wrote: > I'm not too worried about the crash, I was playing with removing rpath > from pledge in the case that /etc/ethers wasn't need which sure enough > it is. pledge was most likely the reason that made it crash. > > But brynet@ pointed out that while he was working on tcpdump last year > he saw that it also needs to open /etc/rpc, and yes it calls > getrpcbynumber(3) so an updated diff is below. > > Index: privsep.c > === > RCS file: /cvs/src/usr.sbin/tcpdump/privsep.c,v > retrieving revision 1.48 > diff -u -p -u -r1.48 privsep.c > --- privsep.c 8 Aug 2018 22:57:12 - 1.48 > +++ privsep.c 26 Sep 2018 17:04:25 - > @@ -207,7 +207,7 @@ __dead void > priv_exec(int argc, char *argv[]) > { > int bpfd = -1; > - int i, sock, cmd, nflag = 0, Pflag = 0; > + int i, sock, cmd, nflag = 0, oflag = 0, Pflag = 0; > char *cmdbuf, *infile = NULL; > char *RFileName = NULL; > char *WFileName = NULL; > @@ -229,6 +229,10 @@ priv_exec(int argc, char *argv[]) > nflag++; > break; > > + case 'o': > + oflag = 1; > + break; > + > case 'r': > RFileName = optarg; > break; > @@ -305,6 +309,14 @@ priv_exec(int argc, char *argv[]) > test_state(cmd, STATE_RUN); > impl_init_done(sock, &bpfd); > > + if (oflag) { > + if (unveil("/etc/pf.os", "r") == -1) > + err(1, "unveil"); > + } > + if (unveil("/etc/ethers", "r") == -1) > + err(1, "unveil"); > + if (unveil("/etc/rpc", "r") == -1) > + err(1, "unveil"); > if (pledge("stdio rpath inet dns recvfd bpf", NULL) == > -1) > err(1, "pledge"); > > > On 08:58 Wed 26 Sep , Theo de Raadt wrote: > > Ricardo Mestre wrote: > > > > > Hi, > > > > > > This has been shown internally for some time, but deraadt@ asked me to > > > show it > > > to a bigger audience now so here it is! > > > > > > If we want OS fingerprinting by using -o flag then we can unveil > > > /etc/pf.os in > > > read mode, nevertheless in order to do this we need to inform the privsep > > > proc > > > that we are using -o so I added it to priv_exec(). > > > > looks right > > > > > The other file needed to be unveiled is /etc/ethers in read mode, which I > > > tried > > > to make it conditional but after several successful tests I bumped into a > > > packet which made tcpdump crash after some time. Unfortunately I don't > > > have the > > > core nor the pcap files to investigate what happen so for now the unveil > > > of > > > this file will be kept unconditional regardless of the flags or expression > > > used. > > > > It is very likely that a protocol parser will find a MAC address nested > > inside, and try to parse it via /etc/ethers. So makes sense, and there > > is little risk in exposing ethers > > > > There is far more risk that some other file is required, and will fail to > > open silently, and tcpdump willbehave differently > > > > You know well: unveil requires a different audit approach than pledge > > > > But it is rare for a missing file via unveil to crash a program, so > > something > > is fishy. A crash isn't good, perhaps try to reproduce and collect a > > corefile > > using the sysctl kern.nosuidcoredump=3 method. > > >
Re: add vlan and trunk to arm64 RAMDISK
That is exactly how I like to see this displayed, all the details are there. i find it odd the rd0a didn't change that much in size. Compression is really helping it. ok deraadt > Howdy. > > Attached is a patch to add vlan and trunk to arm64's RAMDISK (parity > with amd64). > > make release'ed and tested, size increase as follows: > > -rw--- 1 los los12940598 Sep 27 21:01 bsd.rd-patch > > -rw--- 1 los los12864682 Sep 27 05:41 bsd.rd-snap > > > ~74KB difference > > Snap - OpenBSD 6.4-beta (RAMDISK) #475: Wed Sep 26 13:15:25 MDT 2018 > /dev/vnd1c 7.7M7.3M382K95%/mnt > /dev/vnd1c 7907 7526 38195% 1721874 8% /mnt > > /dev/rd0a15815 15055 76095%/ > /dev/rd0a15815 15055 76095% 1751871 9% / > > Patch > > /dev/vnd0c 7.7M7.3M382K95%/mnt > /dev/vnd0c 7907 7526 38195% 1721874 8% /mnt > > /dev/rd0a15815 15055 76095%/ > /dev/rd0a15815 15055 76095% 1751871 9% / > > Ok? > > +--+ > Carlos > Index: RAMDISK > === > RCS file: /cvs/src/sys/arch/arm64/conf/RAMDISK,v > retrieving revision 1.74 > diff -u -p -r1.74 RAMDISK > --- RAMDISK 18 Sep 2018 13:45:09 - 1.74 > +++ RAMDISK 28 Sep 2018 04:48:57 - > @@ -251,6 +251,8 @@ islrtc* at iic? # ISL1208 RTC > rkpmic* at iic? # RK808 PMIC > > pseudo-deviceloop 1 > +pseudo-device vlan > +pseudo-device trunk > pseudo-devicebpfilter 1 > pseudo-devicerd 1 > pseudo-devicebio 1
add vlan and trunk to arm64 RAMDISK
Howdy. Attached is a patch to add vlan and trunk to arm64's RAMDISK (parity with amd64). make release'ed and tested, size increase as follows: -rw--- 1 los los12940598 Sep 27 21:01 bsd.rd-patch -rw--- 1 los los12864682 Sep 27 05:41 bsd.rd-snap ~74KB difference Snap - OpenBSD 6.4-beta (RAMDISK) #475: Wed Sep 26 13:15:25 MDT 2018 /dev/vnd1c 7.7M7.3M382K95%/mnt /dev/vnd1c 7907 7526 38195% 1721874 8% /mnt /dev/rd0a15815 15055 76095%/ /dev/rd0a15815 15055 76095% 1751871 9% / Patch /dev/vnd0c 7.7M7.3M382K95%/mnt /dev/vnd0c 7907 7526 38195% 1721874 8% /mnt /dev/rd0a15815 15055 76095%/ /dev/rd0a15815 15055 76095% 1751871 9% / Ok? +--+ Carlos Index: RAMDISK === RCS file: /cvs/src/sys/arch/arm64/conf/RAMDISK,v retrieving revision 1.74 diff -u -p -r1.74 RAMDISK --- RAMDISK 18 Sep 2018 13:45:09 - 1.74 +++ RAMDISK 28 Sep 2018 04:48:57 - @@ -251,6 +251,8 @@ islrtc* at iic? # ISL1208 RTC rkpmic*at iic? # RK808 PMIC pseudo-device loop 1 +pseudo-device vlan +pseudo-device trunk pseudo-device bpfilter 1 pseudo-device rd 1 pseudo-device bio 1
Re: pfsync: avoid a recursion on PF_LOCK
On Thu, Sep 27, 2018 at 11:30:09PM +0200, Hrvoje Popovski wrote: > On 27.9.2018. 18:34, Alexandr Nedvedicky wrote: > > Mentioning parallelism: there is yet another change you need to perform > > in order to get more pf_test() instances running. Currently there > > is only single input task, which processes inbound packets. In order > > to allow more input tasks one has to change NET_TASKQ constant found > > in net/if.c. Patch below tells kernel to use two input tasks: > > shouldn't this be - Patch below tells kernel to use four input tasks? absolutely correct, it should be four. The patch below makes kernel to use 4 input tasks. regards sashan > > > > 8<---8<---8<--8< > > diff --git a/sys/net/if.c b/sys/net/if.c > > index 78d7d3f3af6..87fca3921f1 100644 > > --- a/sys/net/if.c > > +++ b/sys/net/if.c > > @@ -232,7 +232,7 @@ int ifq_congestion; > > > > int netisr; > > > > -#defineNET_TASKQ 1 > > +#defineNET_TASKQ 4 > > struct taskq *nettqmp[NET_TASKQ]; > > > > struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); > > 8<---8<---8<--8< >
Re: ftp -w is not dying
On Thu, Sep 27, 2018 at 11:59 AM sven falempin wrote: > > > On Thu, Sep 27, 2018 at 10:47 AM sven falempin > wrote: > >> I m not sure how this is possible but here s the data : >> >> i used the ENV to push -w 5 in my pkg_add process : >> # date >> Thu Sep 27 10:40:28 EDT 2018 >> # ps auxww | grep pkgfet >> _pkgfetc 60348 0.0 0.1 1728 5456 ?? INp Wed05PM 0:00.09 /usr/bin/ftp -w 5 >> -S session -o - https:// >> myportal.com/tar/6.3/packages/amd64/p5-LWP-MediaTypes-6.02.tgz >> # fstat -p 60348 >> _pkgfetc ftp 60348 5* internet stream tcp 0x806e8548 >> 172.16.1.35:5512 --> 92.222.70.241:443 >> >> I can see the PF state on the natting device: >> >> tcp 206.180.254.190:52251 (172.16.1.35:5512) -> 92.222.70.241:443 >> ESTABLISHED:ESTABLISHED >> age: 16:55:28 expires: 07:07:25 id: 5b7ce30b0094854a >> rule: pass out quick on em5 all flags S/SA label "READY_TO_NAT" tagged >> READY_TO_NAT >> >> this is stock 6.3 ftp which is still the same now ( >> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ftp/ ) >> >> # ktrace -p 60348 >> generate an empty 64 bit file. >> >> Any other test i could do to understand ? >> >> > Reading NC it seems that the 'poll' syscall is indeed required to timeout correctly. I m trying to find a way to recreate the problem and i am running a modified version in a loop My understanding of the tls_read(SSL_read then ssl_read) .. is that the retry is for SSL protocol error and retry which would only occur on non TCP layer. That's irrelevant in fetch.c I m just guessing the ( at the top of ssl_read ) call will write and fix an POLLIN when write is done. This explaining trying to read again when a write is requested >.< Nevertheless each read/write code ( and even close ) shall be precede by a poll call. Bellow is the patch i m testing at the moment , i do not quite like it, it s not solving the problem, and show how the factorization of read is making the code hard to read/modify IMHO url_get should be just calling url_get_https url_get_http url_get_ftp and each of this function do only that. The _fin_ FILE* may be NULL fd may be -1 randomly in the function, it s kinda messy. And this make it difficult to nicely call poll before reading to avoid being blocked in a read indefinitely RCS file: /cvs/src/usr.bin/ftp/fetch.c,v retrieving revision 1.167 diff -u -p -r1.167 fetch.c --- fetch.c 10 Feb 2018 06:25:16 - 1.167 +++ fetch.c 27 Sep 2018 20:10:52 - @@ -59,6 +59,7 @@ #include #ifndef NOSSL +#include #include #else /* !NOSSL */ struct tls; @@ -73,12 +74,12 @@ voidabortfile(int); char hextochar(const char *); char *urldecode(const char *); char *recode_credentials(const char *_userinfo); -intftp_printf(FILE *, struct tls *, const char *, ...) __attribute__((format(printf, 3, 4))); +intftp_printf(FILE *, int fd, struct tls *, const char *, ...) __attribute__((format(printf, 4, 5))); char *ftp_readline(FILE *, struct tls *, size_t *); size_t ftp_read(FILE *, struct tls *, char *, size_t); #ifndef NOSSL intproxy_connect(int, char *, char *); -intSSL_vprintf(struct tls *, const char *, va_list); +intSSL_vprintf(int fd, struct tls *, const char *, va_list); char *SSL_readline(struct tls *, size_t *); #endif /* !NOSSL */ @@ -98,6 +99,90 @@ jmp_buf httpabort; static int redirect_loop; +#ifndef NOSSL +/*from netcat.c correct way to timeout a tls_call*/ +int +timeout_tls(int s, struct tls *tls_ctx, int (*func)(struct tls *)) +{ + int timeout = connect_timeout; // glu + struct pollfd pfd; + int ret; + + while ((ret = (*func)(tls_ctx)) != 0) { + if (ret == TLS_WANT_POLLIN) + pfd.events = POLLIN; + else if (ret == TLS_WANT_POLLOUT) + pfd.events = POLLOUT; + else + break; + pfd.fd = s; + if ((ret = poll(&pfd, 1, timeout)) == 1) + continue; + else if (ret == 0) { + errno = ETIMEDOUT; + ret = -1; + break; + } else + err(1, "poll failed"); + } + + return ret; +} + +/*must realloc / offset*/ +int +timeout_tls_write(int s, struct tls *tls_ctx, + const void *buf, size_t buflen) +{ + int timeout = connect_timeout; // glu + struct pollfd pfd; + int ret; + + // should check ready first => do poll write then write + while ((ret = tls_write(tls_ctx,buf,buflen)) != 0) { + if (ret == TLS_WANT_POLLIN) + pfd.events = POLLIN; + else if (ret == TLS_WANT_POLLOUT) + pfd.events = POLLOUT; + else + break; + pfd.fd = s; + if ((ret = p
Re: pfctl(8) and securelevel(7)
On Thu, Sep 27, 2018 at 03:35:30PM +0200, Zbyszek Żółkiewski wrote: > sorry, forgot mention: 6.3 -stable Same on -CURRENT and probably older releases as well. > to reproduce: > - at securelevel=1 > - load pf.conf - file whitelist is populated with IP addresses > - try to list table: pfctl -t whitelist -T show > - will all work as expected > - set securelevel=2 (sysctl kern.securelevel=2) > - repeat command: pfctl -t whitelist -T show > - this result in "Operation not permitted” > - now try: pfctl -t whitelist -v -T show > - this will result with printed table contents as well as some stats `-v' uses a different ioctl, see sbin/pfctl/pfctl_table.c:306. # ktrace pfctl -t whitelist -T show # kdump | grep ioctl 61090 pfctlCALL ioctl(3,DIOCRGETADDRS,0x7f7eb8b0) 61090 pfctlRET ioctl -1 errno 1 Operation not permitted # ktrace pfctl -t whitelist -T show -v # kdump |grep ioctl 99126 pfctlCALL ioctl(3,DIOCRGETASTATS,0x7f7d5aa0) 99126 pfctlRET ioctl 0 The securelevel check for PF ioctls simply doesn't account for the first one. Grepping for DIOCRGETADDRS consumers in base only shows pfctl and I currently don't have any concerns in allowing this ioctl when securelevel is >1, but review/confirmation is welcome here. Feedback? OK? Index: net/pf_ioctl.c === RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.337 diff -u -p -r1.337 pf_ioctl.c --- net/pf_ioctl.c 11 Sep 2018 07:53:38 - 1.337 +++ net/pf_ioctl.c 27 Sep 2018 18:29:25 - @@ -951,6 +951,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a case DIOCRADDADDRS: case DIOCRDELADDRS: case DIOCRSETADDRS: + case DIOCRGETADDRS: case DIOCRGETASTATS: case DIOCRCLRASTATS: case DIOCRTSTADDRS:
vmd.c copy-pasto
diff --git a/usr.sbin/vmd/vmd.c b/usr.sbin/vmd/vmd.c index 25d19dc7a7f..2694c3111ac 100644 --- a/usr.sbin/vmd/vmd.c +++ b/usr.sbin/vmd/vmd.c @@ -1861,7 +1861,7 @@ user_checklimit(struct vmd_user *usr, struct vm_create_params *vcp) limit = "cpu "; goto fail; } - if (usr->usr_maxcpu > VM_DEFAULT_USER_MAXMEM) { + if (usr->usr_maxmem > VM_DEFAULT_USER_MAXMEM) { limit = "memory "; goto fail; } -- nest.cx is Gmail hosted, use PGP for anything private. Key: http://goo.gl/6dMsr Fingerprint: 5E2B 2D0E 1E03 2046 BEC3 4D50 0B15 42BD 8DF5 A1B0
nsd 4.1.25
unexciting update to 4.1.25, running in production in front of a powerdns signer without issues. OK? diff --git config.h.in config.h.in index eded09dd6b3..4d47f603062 100644 --- config.h.in +++ config.h.in @@ -1,5 +1,8 @@ /* config.h.in. Generated from configure.ac by autoheader. */ +/* apply the noreturn attribute to a function that exits the program */ +#undef ATTR_NORETURN + /* Define this to enable BIND8 like NSTATS & XSTATS. */ #undef BIND8_STATS @@ -43,6 +46,9 @@ /* Whether the C compiler accepts the "format" attribute */ #undef HAVE_ATTR_FORMAT +/* Whether the C compiler accepts the "noreturn" attribute */ +#undef HAVE_ATTR_NORETURN + /* Whether the C compiler accepts the "unused" attribute */ #undef HAVE_ATTR_UNUSED diff --git configparser.y configparser.y index 547518f88c6..567641ce706 100644 --- configparser.y +++ configparser.y @@ -154,9 +154,6 @@ server_debug_mode: VAR_DEBUG_MODE STRING server_use_systemd: VAR_USE_SYSTEMD STRING { OUTYY(("P(server_use_systemd:%s)\n", $2)); - if(strcmp($2, "yes") != 0 && strcmp($2, "no") != 0) - yyerror("expected yes or no."); - else cfg_parser->opt->use_systemd = (strcmp($2, "yes")==0); } ; server_verbosity: VAR_VERBOSITY STRING diff --git configure configure index 13da401ab9b..bedd3930dbf 100644 --- configure +++ configure @@ -1,6 +1,6 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.69 for NSD 4.1.24. +# Generated by GNU Autoconf 2.69 for NSD 4.1.25. # # Report bugs to . # @@ -580,8 +580,8 @@ MAKEFLAGS= # Identity of this package. PACKAGE_NAME='NSD' PACKAGE_TARNAME='nsd' -PACKAGE_VERSION='4.1.24' -PACKAGE_STRING='NSD 4.1.24' +PACKAGE_VERSION='4.1.25' +PACKAGE_STRING='NSD 4.1.25' PACKAGE_BUGREPORT='nsd-b...@nlnetlabs.nl' PACKAGE_URL='' @@ -734,7 +734,6 @@ enable_minimal_responses enable_mmap enable_radix_tree enable_packed -enable_systemd ' ac_precious_vars='build_alias host_alias @@ -1287,7 +1286,7 @@ if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures NSD 4.1.24 to adapt to many kinds of systems. +\`configure' configures NSD 4.1.25 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... @@ -1348,7 +1347,7 @@ fi if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of NSD 4.1.24:";; + short | recursive ) echo "Configuration of NSD 4.1.25:";; esac cat <<\_ACEOF @@ -1387,7 +1386,6 @@ Optional Features: less memory, but uses some more CPU. --enable-packed Enable packed structure alignment, uses less memory, but unaligned reads. - --enable-systemdcompile with systemd support Optional Packages: --with-PACKAGE[=ARG]use PACKAGE [ARG=yes] @@ -1498,7 +1496,7 @@ fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -NSD configure 4.1.24 +NSD configure 4.1.25 generated by GNU Autoconf 2.69 Copyright (C) 2012 Free Software Foundation, Inc. @@ -2207,7 +2205,7 @@ cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by NSD $as_me 4.1.24, which was +It was created by NSD $as_me 4.1.25, which was generated by GNU Autoconf 2.69. Invocation command line was $ $0 $@ @@ -4987,6 +4985,7 @@ fi + # Checks for typedefs, structures, and compiler characteristics. # allow user to override the -g -O2 flags. if test "x$CFLAGS" = "x" ; then @@ -5456,6 +5455,49 @@ $as_echo "#define HAVE_ATTR_UNUSED 1" >>confdefs.h fi + +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the C compiler (${CC-cc}) accepts the \"noreturn\" attribute" >&5 +$as_echo_n "checking whether the C compiler (${CC-cc}) accepts the \"noreturn\" attribute... " >&6; } +if ${ac_cv_c_noreturn_attribute+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_cv_c_noreturn_attribute=no +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + #include +__attribute__((noreturn)) void f(int x) { printf("%d", x); } + +int +main () +{ + + f(1); + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + ac_cv_c_noreturn_attribute="yes" +else + ac_cv_c_noreturn_attribute="no" +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext + +fi + + +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_noreturn_attribute" >&5 +$as_echo "$ac_cv_c_noreturn_attribute" >&6; } +if test $ac_cv_c_noreturn_attribute = yes; then + +$as_echo "#define HAVE_ATTR_NORETURN 1" >>confdefs.h + + +$as_echo "#define ATTR_NORETURN __attribute__((__noreturn__))" >>confdefs.h + +fi + { $as_ech
pfsync: avoid a recursion on PF_LOCK
Hello, patch below is missing piece to stuff, which I commit on n2k18 [1]. Fairly quickly people, who have PF deployed with pfsync (and are willing to experiment), discovered a panic on PF_LOCK recursion. The recursion is identified by stack as follows: login: panic: rw_enter: pf_lock locking against myself Stopped at db_enter+0x12: popq%r11 TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 147548 82409 00x12 03 ld *192136 5192 0 0x14000 0x2000 softnet db_enter() at db_enter+0x12 panic() at panic+0x120 _rw_enter() at _rw_enter+0x2a3 pfsync_update_state() at pfsync_update_state+0x90 pf_test() at pf_test+0x608 ip_output() at ip_output+0x6cd pfsync_sendout() at pfsync_sendout+0x645 pfsync_bulk_start() at pfsync_bulk_start+0x127 pfsync_in_ureq() at pfsync_in_ureq+0x71 pfsync_input() at pfsync_input+0x3ab ip_deliver() at ip_deliver+0x203 ipintr() at ipintr+0x6a if_netisr(813abf00) at if_netisr+0x5a taskq_thread(0) at taskq_thread+0x6d Big thanks goes to hrvoje@, who provided me with proper set up so I could debug the patch. And also a fair amount of credit goes to Christiano Haesbaert for idea to dispatch outbound psync update packet to task. The change in patch is conservative. The pfsync behavior does not change unless PF gets compiled with '-DWITH_PF_LOCK' option. '-DWITH_PF_LOCK' tells pfsync to use a softnet task to transmit packet with update (see net/if.c [2]) Also let me clarify my earlier commit message [1] as I've received few questions about it. At the moment my changes, which unlock PF are experimental. The code is disabled by default. In order to test my code you need to compile kernel with '-DWITH_PF_LOCK'. The easiest thing is to patch CONFIG.MP with change below: 8<---8<---8<--8< diff --git a/sys/arch/amd64/conf/GENERIC.MP b/sys/arch/amd64/conf/GENERIC.MP index 7f195a53092..69c62d23cf8 100644 --- a/sys/arch/amd64/conf/GENERIC.MP +++ b/sys/arch/amd64/conf/GENERIC.MP @@ -2,6 +2,7 @@ include "arch/amd64/conf/GENERIC" +option WITH_PF_LOCK option MULTIPROCESSOR #optionMP_LOCKDEBUG #optionWITNESS 8<---8<---8<--8< Compiling kernel with -DWITH_PF_LOCK will get you PF with two rw-locks: pf_lock, which serializes packets with ioctls. Currently pf_lock, is always grabbed exclusively. pf_state_lock, which packet grabs as a reader to look up a state, if no state is found, then packet drops the pf_state_lock and proceeds to rules. At that point packet grabs pf_lock as a writer. If the packet matches a pass rule, then it must grab pf_state_lock as a writer in order to insert a new state to table. The pf_state_lock allows two pf_test() function to run in parallel for packets, which match existing state. Packets, which don't match the state, are serialized on pf_lock. Mentioning parallelism: there is yet another change you need to perform in order to get more pf_test() instances running. Currently there is only single input task, which processes inbound packets. In order to allow more input tasks one has to change NET_TASKQ constant found in net/if.c. Patch below tells kernel to use two input tasks: 8<---8<---8<--8< diff --git a/sys/net/if.c b/sys/net/if.c index 78d7d3f3af6..87fca3921f1 100644 --- a/sys/net/if.c +++ b/sys/net/if.c @@ -232,7 +232,7 @@ int ifq_congestion; int netisr; -#defineNET_TASKQ 1 +#defineNET_TASKQ 4 struct taskq *nettqmp[NET_TASKQ]; struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); 8<---8<---8<--8< Finally let me explain few details found in my patch: @@ -288,6 +296,9 @@ pfsyncattach(int npfsync) { if_clone_attach(&pfsync_cloner); pfsynccounters = counters_alloc(pfsyncs_ncounters); +#ifdef WITH_PF_LOCK + mq_init(&pfsync_mq, 4096, IPL_SOFTNET); +#endif /* WITH_PF_LOCK */ } The queue limit of 4k is empirically determined by testing on set up provided hrvoje@. The tests were running on intel 10Gb NICs. Using 4k provides sufficient buffer to limit drops of pfsync packets to <5%. The drops, which I could see when using 4k limit, were coming from ip_output() here: 680 for (off = hlen + len; off < ntohs(ip->ip_len); off += len) { 681 MGETHDR(m, M_DONTWAIT, MT_HEADER); 682 if (m == NULL) { 683 ipstat_inc(ips_odropped); 684 error = ENOBUFS; 685 goto sendorfree; 686 } clearly the syste
Re: bgpd, withdraws and stuck routes
On Thu, Sep 27, 2018 at 04:41:22PM +0200, Claudio Jeker wrote: > Some people noticed that routes get stuck or to be more precise that > withdraws are not sent to peers in some cases. Until now bgpd did not > really track what was announced and what not (there is no real > Adj-RIB-Out) and instead tried to figure out if it should or should not > send the withdraw. With the increased filtering we try to push this fails > more often and the result is rather terrible. > Instead now introduce minimal tracking of announced prefixes. This adds a > per peer RB tree that tracks what was sent out as UPDATE. This can be > considered a minimal Adj-RIB-Out. As usual once you send it out you realize that something is missing. I first only looked at callers of up_generate_updates() but then I double checked the sofftreconfig out handler and noticed that we need the same magic there too (else the accouting is getting off). The following diff does this now as well. Double checked all other up_generate() users as well. -- :wq Claudio Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.427 diff -u -p -r1.427 rde.c --- rde.c 25 Sep 2018 08:08:38 - 1.427 +++ rde.c 27 Sep 2018 15:49:42 - @@ -3147,9 +3147,11 @@ rde_softreconfig_out_peer(struct rib_ent /* nothing todo */ if (oa == ACTION_DENY && na == ACTION_ALLOW) { /* send update */ + up_rib_add(peer, re); up_generate(peer, &nstate, &addr, pt->prefixlen); } else if (oa == ACTION_ALLOW && na == ACTION_DENY) { /* send withdraw */ + up_rib_remove(peer, re); up_generate(peer, NULL, &addr, pt->prefixlen); } else if (oa == ACTION_ALLOW && na == ACTION_ALLOW) { /* send update if anything changed */ @@ -3197,6 +3199,7 @@ rde_softreconfig_unload_peer(struct rib_ prefix_nhflags(p)); if (rde_filter(out_rules_tmp, peer, p, &ostate) != ACTION_DENY) { /* send withdraw */ + up_rib_remove(peer, re); up_generate(peer, NULL, &addr, pt->prefixlen); } rde_filterstate_clean(&ostate); Index: rde.h === RCS file: /cvs/src/usr.sbin/bgpd/rde.h,v retrieving revision 1.193 diff -u -p -r1.193 rde.h --- rde.h 20 Sep 2018 11:45:59 - 1.193 +++ rde.h 27 Sep 2018 15:48:02 - @@ -55,6 +55,7 @@ LIST_HEAD(aspath_head, rde_aspath); TAILQ_HEAD(aspath_queue, rde_aspath); RB_HEAD(uptree_prefix, update_prefix); RB_HEAD(uptree_attr, update_attr); +RB_HEAD(uptree_rib, update_rib); struct rib_desc; struct rib; @@ -70,6 +71,7 @@ struct rde_peer { struct bgpd_addr remote_addr; struct bgpd_addr local_v4_addr; struct bgpd_addr local_v6_addr; + struct uptree_ribup_rib; struct uptree_prefix up_prefix; struct uptree_attr up_attrs; struct uplist_attr updates[AID_MAX]; @@ -566,6 +568,8 @@ int nexthop_compare(struct nexthop *, /* rde_update.c */ voidup_init(struct rde_peer *); voidup_down(struct rde_peer *); +int up_rib_remove(struct rde_peer *, struct rib_entry *); +voidup_rib_add(struct rde_peer *, struct rib_entry *); int up_test_update(struct rde_peer *, struct prefix *); int up_generate(struct rde_peer *, struct filterstate *, struct bgpd_addr *, u_int8_t); Index: rde_update.c === RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v retrieving revision 1.99 diff -u -p -r1.99 rde_update.c --- rde_update.c18 Sep 2018 16:54:01 - 1.99 +++ rde_update.c27 Sep 2018 15:51:22 - @@ -53,9 +53,15 @@ struct update_attr { u_int16_tmpattr_len; }; +struct update_rib { + RB_ENTRY(update_rib) entry; + struct rib_entry*re; +}; + void up_clear(struct uplist_attr *, struct uplist_prefix *); intup_prefix_cmp(struct update_prefix *, struct update_prefix *); intup_attr_cmp(struct update_attr *, struct update_attr *); +intup_rib_cmp(struct update_rib *, struct update_rib *); intup_add(struct rde_peer *, struct update_prefix *, struct update_attr *); RB_PROTOTYPE(uptree_prefix, update_prefix, entry, up_prefix_cmp) @@ -64,6 +70,9 @@ RB_GENERATE(uptree_prefix, update_prefix RB_PROTOTYPE(uptree_attr, update_attr, entry, up_attr_cmp) RB_GENERATE(uptree_attr, update_attr, entry, up_attr_cmp) +RB_PROTOTYPE(uptree_rib, update_rib, entry, up_rib_cmp) +RB_GENERATE(uptree_rib, update_rib, entry, up_rib_cmp) + SIPHASH_KEY uptree_key; void @@ -77,6 +86,7 @@ up_init(struct r
Re: ftp -w is not dying
On Thu, Sep 27, 2018 at 10:47 AM sven falempin wrote: > I m not sure how this is possible but here s the data : > > i used the ENV to push -w 5 in my pkg_add process : > # date > Thu Sep 27 10:40:28 EDT 2018 > # ps auxww | grep pkgfet > _pkgfetc 60348 0.0 0.1 1728 5456 ?? INp Wed05PM 0:00.09 /usr/bin/ftp -w 5 > -S session -o - https:// > myportal.com/tar/6.3/packages/amd64/p5-LWP-MediaTypes-6.02.tgz > # fstat -p 60348 > _pkgfetc ftp 60348 5* internet stream tcp 0x806e8548 > 172.16.1.35:5512 --> 92.222.70.241:443 > > I can see the PF state on the natting device: > > tcp 206.180.254.190:52251 (172.16.1.35:5512) -> 92.222.70.241:443 > ESTABLISHED:ESTABLISHED > age: 16:55:28 expires: 07:07:25 id: 5b7ce30b0094854a > rule: pass out quick on em5 all flags S/SA label "READY_TO_NAT" tagged > READY_TO_NAT > > this is stock 6.3 ftp which is still the same now ( > https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ftp/ ) > > # ktrace -p 60348 > generate an empty 64 bit file. > > Any other test i could do to understand ? > > My understanding of the code is that the -w cannot work . I have been using a lot of socket and c code, and always ended using non blocking fd because : life the code is not using non blocking fd , and rely on the POLLIN of tls in a strange way The syscall in fetch.c are ( geee the get_url is crazy long and full o ifdef :s ) error = getaddrinfo(host, port, &hints, &res0); fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol); (void)signal(SIGALRM, tooslow); alarmtimer(connect_timeout); connect ( and tls_init and stuff and then directly tls_read like that : do { tret = tls_read(tls, buf, len); } while (tret == TLS_WANT_POLLIN || tret == TLS_WANT_POLLOUT); i did not code much with tls_read but calling tls_read on TLS_WANT_POLLOUT seems far FETCH Cheers. -- -- - Knowing is not enough; we must apply. Willing is not enough; we must do
ftp -w is not dying
I m not sure how this is possible but here s the data : i used the ENV to push -w 5 in my pkg_add process : # date Thu Sep 27 10:40:28 EDT 2018 # ps auxww | grep pkgfet _pkgfetc 60348 0.0 0.1 1728 5456 ?? INp Wed05PM 0:00.09 /usr/bin/ftp -w 5 -S session -o - https:// myportal.com/tar/6.3/packages/amd64/p5-LWP-MediaTypes-6.02.tgz # fstat -p 60348 _pkgfetc ftp 60348 5* internet stream tcp 0x806e8548 172.16.1.35:5512 --> 92.222.70.241:443 I can see the PF state on the natting device: tcp 206.180.254.190:52251 (172.16.1.35:5512) -> 92.222.70.241:443 ESTABLISHED:ESTABLISHED age: 16:55:28 expires: 07:07:25 id: 5b7ce30b0094854a rule: pass out quick on em5 all flags S/SA label "READY_TO_NAT" tagged READY_TO_NAT this is stock 6.3 ftp which is still the same now ( https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ftp/ ) # ktrace -p 60348 generate an empty 64 bit file. Any other test i could do to understand ? -- - Knowing is not enough; we must apply. Willing is not enough; we must do
bgpd, withdraws and stuck routes
Some people noticed that routes get stuck or to be more precise that withdraws are not sent to peers in some cases. Until now bgpd did not really track what was announced and what not (there is no real Adj-RIB-Out) and instead tried to figure out if it should or should not send the withdraw. With the increased filtering we try to push this fails more often and the result is rather terrible. Instead now introduce minimal tracking of announced prefixes. This adds a per peer RB tree that tracks what was sent out as UPDATE. This can be considered a minimal Adj-RIB-Out. This is an important fix and needs to be tested. -- :wq Claudio Index: rde.h === RCS file: /cvs/src/usr.sbin/bgpd/rde.h,v retrieving revision 1.193 diff -u -p -r1.193 rde.h --- rde.h 20 Sep 2018 11:45:59 - 1.193 +++ rde.h 27 Sep 2018 11:56:43 - @@ -55,6 +55,7 @@ LIST_HEAD(aspath_head, rde_aspath); TAILQ_HEAD(aspath_queue, rde_aspath); RB_HEAD(uptree_prefix, update_prefix); RB_HEAD(uptree_attr, update_attr); +RB_HEAD(uptree_rib, update_rib); struct rib_desc; struct rib; @@ -70,6 +71,7 @@ struct rde_peer { struct bgpd_addr remote_addr; struct bgpd_addr local_v4_addr; struct bgpd_addr local_v6_addr; + struct uptree_ribup_rib; struct uptree_prefix up_prefix; struct uptree_attr up_attrs; struct uplist_attr updates[AID_MAX]; Index: rde_decide.c === RCS file: /cvs/src/usr.sbin/bgpd/rde_decide.c,v retrieving revision 1.71 diff -u -p -r1.71 rde_decide.c --- rde_decide.c29 Aug 2018 08:51:49 - 1.71 +++ rde_decide.c27 Sep 2018 09:25:52 - @@ -264,7 +264,7 @@ prefix_evaluate(struct prefix *p, struct if (LIST_EMPTY(&re->prefix_h)) LIST_INSERT_HEAD(&re->prefix_h, p, rib_l); else { - LIST_FOREACH(xp, &re->prefix_h, rib_l) + LIST_FOREACH(xp, &re->prefix_h, rib_l) { if (prefix_cmp(p, xp) > 0) { LIST_INSERT_BEFORE(xp, p, rib_l); break; @@ -273,6 +273,7 @@ prefix_evaluate(struct prefix *p, struct LIST_INSERT_AFTER(xp, p, rib_l); break; } + } } } Index: rde_update.c === RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v retrieving revision 1.99 diff -u -p -r1.99 rde_update.c --- rde_update.c18 Sep 2018 16:54:01 - 1.99 +++ rde_update.c27 Sep 2018 13:34:56 - @@ -53,9 +53,17 @@ struct update_attr { u_int16_tmpattr_len; }; +struct update_rib { + RB_ENTRY(update_rib) entry; + struct rib_entry*re; +}; + void up_clear(struct uplist_attr *, struct uplist_prefix *); intup_prefix_cmp(struct update_prefix *, struct update_prefix *); intup_attr_cmp(struct update_attr *, struct update_attr *); +intup_rib_cmp(struct update_rib *, struct update_rib *); +intup_rib_remove(struct rde_peer *, struct rib_entry *); +void up_rib_add(struct rde_peer *, struct rib_entry *); intup_add(struct rde_peer *, struct update_prefix *, struct update_attr *); RB_PROTOTYPE(uptree_prefix, update_prefix, entry, up_prefix_cmp) @@ -64,6 +72,9 @@ RB_GENERATE(uptree_prefix, update_prefix RB_PROTOTYPE(uptree_attr, update_attr, entry, up_attr_cmp) RB_GENERATE(uptree_attr, update_attr, entry, up_attr_cmp) +RB_PROTOTYPE(uptree_rib, update_rib, entry, up_rib_cmp) +RB_GENERATE(uptree_rib, update_rib, entry, up_rib_cmp) + SIPHASH_KEY uptree_key; void @@ -77,6 +88,7 @@ up_init(struct rde_peer *peer) } RB_INIT(&peer->up_prefix); RB_INIT(&peer->up_attrs); + RB_INIT(&peer->up_rib); peer->up_pcnt = 0; peer->up_acnt = 0; peer->up_nlricnt = 0; @@ -110,13 +122,18 @@ up_clear(struct uplist_attr *updates, st void up_down(struct rde_peer *peer) { - u_int8_ti; + struct update_rib *ur, *nur; + u_int8_ti; for (i = 0; i < AID_MAX; i++) up_clear(&peer->updates[i], &peer->withdraws[i]); RB_INIT(&peer->up_prefix); RB_INIT(&peer->up_attrs); + RB_FOREACH_SAFE(ur, uptree_rib, &peer->up_rib, nur) { + RB_REMOVE(uptree_rib, &peer->up_rib, ur); + free(ur); + } peer->up_pcnt = 0; peer->up_acnt = 0; @@ -195,6 +212,42 @@ up_attr_cmp(struct update_attr *a, struc } int +up_rib_cmp(struct update_rib *a, struct update_rib *b) +{ +
Re: pfctl(8) and securelevel(7)
Here: > Wiadomość napisana przez Klemens Nanni w dniu 27.09.2018, > o godz. 15:19: > > What version are you running? sorry, forgot mention: 6.3 -stable > > On Thu, Sep 27, 2018 at 02:06:44PM +0200, Zbyszek Żółkiewski wrote: >> At securelevel(7) set to 2, NAT rules and filter cannot be altered, however >> as stated in pfctl.conf(5) manual - it is possible to modify tables by >> adding/deleting entries >> (https://man.openbsd.org/pf.conf.5#TABLES) >> >> and this works fine. Question: why it is not possible to list content of >> tables?: >> n >> kern.securelevel=2 >> pfctl -t whitelist -T show >> pfctl: Operation not permitted. >> >> while: >> kern.securelevel=1 >> pfctl -t whitelist -T show >> 192.168.1.7 >> 192.168.1.20 >> 192.168.1.25 >> >> and more odd, adding -v flag allow list it anyway: >> >> pfctl -t whitelist -v -T show >> 192.168.1.7 >>Cleared: Thu Sep 27 13:47:58 2018 >> 192.168.1.20 >>Cleared: Thu Sep 27 13:47:58 2018 >> >> I am bit confused, this is bug or i am missing something ? > So am I. Did you add `-v' while securelevel was set to 2 or 1? it was set 2 > > Please provide a clear way to reproduce your scenario, possibly > including the table definitions from your pf.conf. pf.conf snippet: table persist file "/etc/pf/whitelist” counters whitelist contains list of IPs to reproduce: - at securelevel=1 - load pf.conf - file whitelist is populated with IP addresses - try to list table: pfctl -t whitelist -T show - will all work as expected - set securelevel=2 (sysctl kern.securelevel=2) - repeat command: pfctl -t whitelist -T show - this result in "Operation not permitted” - now try: pfctl -t whitelist -v -T show - this will result with printed table contents as well as some stats _ Zbyszek Żółkiewski
Re: pfctl(8) and securelevel(7)
What version are you running? On Thu, Sep 27, 2018 at 02:06:44PM +0200, Zbyszek Żółkiewski wrote: > At securelevel(7) set to 2, NAT rules and filter cannot be altered, however > as stated in pfctl.conf(5) manual - it is possible to modify tables by > adding/deleting entries > (https://man.openbsd.org/pf.conf.5#TABLES) > > and this works fine. Question: why it is not possible to list content of > tables?: >n > kern.securelevel=2 > pfctl -t whitelist -T show > pfctl: Operation not permitted. > > while: > kern.securelevel=1 > pfctl -t whitelist -T show >192.168.1.7 >192.168.1.20 >192.168.1.25 > > and more odd, adding -v flag allow list it anyway: > > pfctl -t whitelist -v -T show >192.168.1.7 > Cleared: Thu Sep 27 13:47:58 2018 >192.168.1.20 > Cleared: Thu Sep 27 13:47:58 2018 > > I am bit confused, this is bug or i am missing something ? So am I. Did you add `-v' while securelevel was set to 2 or 1? Please provide a clear way to reproduce your scenario, possibly including the table definitions from your pf.conf.
pfctl(8) and securelevel(7)
Hi list, At securelevel(7) set to 2, NAT rules and filter cannot be altered, however as stated in pfctl.conf(5) manual - it is possible to modify tables by adding/deleting entries (https://man.openbsd.org/pf.conf.5#TABLES) and this works fine. Question: why it is not possible to list content of tables?: kern.securelevel=2 pfctl -t whitelist -T show pfctl: Operation not permitted. while: kern.securelevel=1 pfctl -t whitelist -T show 192.168.1.7 192.168.1.20 192.168.1.25 and more odd, adding -v flag allow list it anyway: pfctl -t whitelist -v -T show 192.168.1.7 Cleared: Thu Sep 27 13:47:58 2018 192.168.1.20 Cleared: Thu Sep 27 13:47:58 2018 I am bit confused, this is bug or i am missing something ? _ Zbyszek Żółkiewski
Re: bgpd ROA validation
On Thu, Sep 27, 2018 at 09:39:36AM +0200, Claudio Jeker wrote: > On Wed, Sep 26, 2018 at 06:24:36PM +0200, Claudio Jeker wrote: > > On Tue, Sep 25, 2018 at 12:23:48PM +0200, Claudio Jeker wrote: > > > On Sat, Sep 22, 2018 at 09:48:24PM +, Job Snijders wrote: > > > > Hi claudio, > > > > > > > > Seems we are getting very close. Some suggestions to simplify the > > > > experience for the end user. > > > > > > > > Let's start with supporting just one (unnamed) roa-set, so far I've > > > > really not come across a use case where multiple ROA tables are useful. > > > > I say this having implemented origin validation on both ISPs and IXPs. > > > > > > > > The semantics of the Origin Validation procedure that apply to > > > > "Validated ROA Payloads" are not compatible with how the ARIN WHOIS > > > > OriginAS semantics, so roa-set will never be used for ARIN WHOIS data. > > > > > > Please explain further, because honestly both the ruleset that > > > arouteserver generates for ARIN WHOIS OriginAS and ROA are doing > > > the same. They connect a source-as with a prefix. This is what a > > > roa-set is giving you. It implements a way to quickly lookup a 2 key > > > element (AS + prefix). Depending on how this table is used it can be used > > > for both cases. This is the technical view based on looking at rulesets > > > and > > > thinking on how to write them in a way that is fast and understandable. > > > I understand that from an administration point of view RPKI ROA is much > > > stronger and can therefor be used more strictly (e.g. with the simple rule > > > deny quick from ebgp roa-set RPKI invalid). The generic origin > > > validatiation > > > as a supporting measure to IRR based prefixset filtering is only allowing > > > additional prefixes based on the roa-set (e.g. match from ebgp \ > > > large-community $INTCOMM_ORIGIN_OK roa-set ARINDB valid \ > > > set large-community $INTCOMM_PREF_OK). They are two different things but > > > can use the same filter building blocks. > > > Maybe naming it roa-set is wrong (since it is too much connected with > > > RPKI) but the lookup table is usable for more than just RPKI. This is why > > > I think supporting multiple tables is benefitial. I also agree that a > > > simplified user experience for RPKI is a good thing, it should be simple > > > to use. > > > > > > > There currently is 1 RPKI, and therefor we should have just 1 roa-set. > > > > > > I would fully agree here if ARIN would make their trust anchor freely > > > distributable. But yes in general only one RPKI table is needed. > > > My point is that roa-set is more generic than RPKI. It is not an RPKI only > > > thing. It can be used for more than that and we should support this as > > > well since it makes for much better and efficient configs. > > > > > > > The advantage of having only one roa-set is that it does not need to be > > > > referenced in the policies. > > > > > > > > I propose the patch is amended to accomodate the following: > > > > > > > > roa-set { > > > > 165.254.255.0/24 source-as 15562 > > > > 193.0.0.0/21 source-as > > > > } > > > > > > > > deny from any ovs invalid > > > > match from any ovs valid set community local-as:42 > > > > match from any ovs unknown set community local-as:43 > > > > > > > > I suggest the filter keyword 'ovs' (origin validation state) is > > > > introduced to allow easy matching. I think this looks better. If the > > > > roa-set is not defined or empty, all route announcements are 'unknown'. > > > > > > If we want to use ovs then the naming scheme should be the same as for the > > > extended community. At least if we reuse this keyword it should work the > > > same. Also the side-effect is that the two configs look fairly similar > > > which can be good or bad: > > > match from any ovs valid set community local-as:42 > > > match from any ext-community ovs valid set community local-as:42 > > > > > > I'm not against this, just wanted to bring it up. > > > > > > Also I think unknown should be renamed not-found. I will do that since > > > not-found is what the RFC is using. > > > > > > I will rework the diff considering the input. > > > > Here we go. This changes the diff so there is only one roa-set like > > mentioned above: > > > > roa-set { > > 165.254.255.0/24 source-as 15562 > > 193.0.0.0/21 source-as > > } > > > > deny from any ovs invalid > > match from any ovs valid set community local-as:42 > > match from any ovs unknown set community local-as:43 > > > > Additionally I left the old sets around but used a bit different: > > > > origin-set FOO { > > ... > > } > > > > match from any community $ORIGIN_AS_OK origin-set FOO valid \ > > set community $ORIGIN_PREF_OK > > > > I may even consider to drop the 'valid' in the above rule since that is > > the only check that kind of matt
Re: bgpd ROA validation
On Wed, Sep 26, 2018 at 06:24:36PM +0200, Claudio Jeker wrote: > On Tue, Sep 25, 2018 at 12:23:48PM +0200, Claudio Jeker wrote: > > On Sat, Sep 22, 2018 at 09:48:24PM +, Job Snijders wrote: > > > Hi claudio, > > > > > > Seems we are getting very close. Some suggestions to simplify the > > > experience for the end user. > > > > > > Let's start with supporting just one (unnamed) roa-set, so far I've > > > really not come across a use case where multiple ROA tables are useful. > > > I say this having implemented origin validation on both ISPs and IXPs. > > > > > > The semantics of the Origin Validation procedure that apply to > > > "Validated ROA Payloads" are not compatible with how the ARIN WHOIS > > > OriginAS semantics, so roa-set will never be used for ARIN WHOIS data. > > > > Please explain further, because honestly both the ruleset that > > arouteserver generates for ARIN WHOIS OriginAS and ROA are doing > > the same. They connect a source-as with a prefix. This is what a > > roa-set is giving you. It implements a way to quickly lookup a 2 key > > element (AS + prefix). Depending on how this table is used it can be used > > for both cases. This is the technical view based on looking at rulesets and > > thinking on how to write them in a way that is fast and understandable. > > I understand that from an administration point of view RPKI ROA is much > > stronger and can therefor be used more strictly (e.g. with the simple rule > > deny quick from ebgp roa-set RPKI invalid). The generic origin validatiation > > as a supporting measure to IRR based prefixset filtering is only allowing > > additional prefixes based on the roa-set (e.g. match from ebgp \ > > large-community $INTCOMM_ORIGIN_OK roa-set ARINDB valid \ > > set large-community $INTCOMM_PREF_OK). They are two different things but > > can use the same filter building blocks. > > Maybe naming it roa-set is wrong (since it is too much connected with > > RPKI) but the lookup table is usable for more than just RPKI. This is why > > I think supporting multiple tables is benefitial. I also agree that a > > simplified user experience for RPKI is a good thing, it should be simple > > to use. > > > > > There currently is 1 RPKI, and therefor we should have just 1 roa-set. > > > > I would fully agree here if ARIN would make their trust anchor freely > > distributable. But yes in general only one RPKI table is needed. > > My point is that roa-set is more generic than RPKI. It is not an RPKI only > > thing. It can be used for more than that and we should support this as > > well since it makes for much better and efficient configs. > > > > > The advantage of having only one roa-set is that it does not need to be > > > referenced in the policies. > > > > > > I propose the patch is amended to accomodate the following: > > > > > > roa-set { > > > 165.254.255.0/24 source-as 15562 > > > 193.0.0.0/21 source-as > > > } > > > > > > deny from any ovs invalid > > > match from any ovs valid set community local-as:42 > > > match from any ovs unknown set community local-as:43 > > > > > > I suggest the filter keyword 'ovs' (origin validation state) is > > > introduced to allow easy matching. I think this looks better. If the > > > roa-set is not defined or empty, all route announcements are 'unknown'. > > > > If we want to use ovs then the naming scheme should be the same as for the > > extended community. At least if we reuse this keyword it should work the > > same. Also the side-effect is that the two configs look fairly similar > > which can be good or bad: > > match from any ovs valid set community local-as:42 > > match from any ext-community ovs valid set community local-as:42 > > > > I'm not against this, just wanted to bring it up. > > > > Also I think unknown should be renamed not-found. I will do that since > > not-found is what the RFC is using. > > > > I will rework the diff considering the input. > > Here we go. This changes the diff so there is only one roa-set like > mentioned above: > > roa-set { > 165.254.255.0/24 source-as 15562 > 193.0.0.0/21 source-as > } > > deny from any ovs invalid > match from any ovs valid set community local-as:42 > match from any ovs unknown set community local-as:43 > > Additionally I left the old sets around but used a bit different: > > origin-set FOO { > ... > } > > match from any community $ORIGIN_AS_OK origin-set FOO valid \ > set community $ORIGIN_PREF_OK > > I may even consider to drop the 'valid' in the above rule since that is > the only check that kind of matters in this case. > New version removing the validity argument from origin-set FOO filter rules as mentioned above. Apart from that it should be the same. -- :wq Claudio Index: bgpd.c === RCS file: /cvs/src/usr.sbin/bgpd/bgpd.