Re: Panic with pf 'once' rule
Aaron Bieber writes: > Hi, > > Adding a rule similar to the below causes a panic on -current (OpenBSD > 6.5-current (GENERIC) #95: Thu Jul 4 21:22:25 MDT 2019). This also panics 6.3 > and 6.5 (I didn't test 6.4): > > pass in quick on egress proto tcp from any to port once rdr-to \ > 127.0.0.1 port > > Once the rule is in place, fire up: > > nc -l 127.0.0.1 > > Connect a few times from a remote machine: > > nc > > Eventually it will panic with (sometimes it happens right away, other times I > have to restart nc a few times): > This is because it's meant to be used inside of an anchor (it removes the rule once it's matched). The most sensible way to use it is to put it into the anchor inside a recursive anchor (e.g. 'relayd/*'). It's possible that the check protecting the system from the misuse like you've described here got lost during refactoring or it never existed in the first place :-( Cheers, Mike > ddb> trace > pf_rm_rule(81d900a8,803bbfe8) at pf_rm_rule+0xa9 > pf_purge_rule(803bbfe8) at pf_purge_rule+0x26 > pf_purge(81dc1088) at pf_purge+0x55 > taskq_thread(80022040) at taskq_thread+0x3d > end trace frame: 0x0, count: -4 > ddb> > ddb> ps > PID TID PPIDUID S FLAGS WAIT COMMAND >69502 12189 1 0 30x100083 ttyin ksh >53972 340673 1 0 30x100098 poll cron >81827 279222 1110 30x100090 poll sndiod >54852 68160 1 99 30x100090 poll sndiod >79474 94554 3215 95 30x100092 kqreadsmtpd >90212 164878 3215103 30x100092 kqreadsmtpd >43199 482512 3215 95 30x100092 kqreadsmtpd >38765 100663 3215 95 30x100092 kqreadsmtpd >33241 424770 3215 95 30x100092 kqreadsmtpd > 5338 193750 3215 95 30x100092 kqreadsmtpd > 3215 481909 1 0 30x100080 kqreadsmtpd >57742 143403 1 0 30x80 selectsshd >31904 460143 1 0 30x100080 poll ntpd >65592 182120 47006 83 30x100092 poll ntpd >47006 103509 1 83 30x100092 poll ntpd >60875 292765 99617 74 30x100092 bpf pflogd >99617 524148 1 0 30x80 netio pflogd > 4242 324170 49064 73 30x100090 kqreadsyslogd >49064 413359 1 0 30x100082 netio syslogd >20955 102995 68995115 30x100092 kqreadslaacd >99883 518930 68995115 30x100092 kqreadslaacd >68995 175540 1 0 30x100080 kqreadslaacd > 5278 238159 0 0 3 0x14200 pgzerozerothread > 2253 479921 0 0 3 0x14200 aiodoned aiodoned >98149 310276 0 0 3 0x14200 syncerupdate >78055 259911 0 0 3 0x14200 cleaner cleaner >68827 324781 0 0 3 0x14200 reaperreaper >93269 98863 0 0 3 0x14200 pgdaemon pagedaemon >75284 447451 0 0 3 0x14200 bored crynlk >34868 513191 0 0 3 0x14200 bored crypto > *18776 255193 0 0 7 0x14200softnet >64918 469356 0 0 3 0x14200 bored systqmp > 902 49537 0 0 3 0x14200 bored systq >17250 200730 0 0 3 0x40014200 bored softclock > 2990 510299 0 0 3 0x40014200idle0 > 947 215447 0 0 3 0x14200 bored smr >1 180680 0 0 30x82 wait init >0 0 -1 0 3 0x10200 scheduler swapper > ddb> > > dmesg (from a VM in vmm - I have also reproduced this on physical hw): > OpenBSD 6.5-current (GENERIC) #95: Thu Jul 4 21:22:25 MDT 2019 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > real mem = 4278181888 (4079MB) > avail mem = 4138524672 (3946MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f10 (12 entries) > bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011 > bios0: OpenBSD VMM > acpi at bios0 not configured > cpu0 at mainbus0: (uniprocessor) > cpu0: AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx, 37466.79 MHz, 17-11-00 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SH
Re: Panic with pf 'once' rule
I get the panic when the rule is in an anchor as well. PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A 4AF0 1F81 112D 62A9 ADCE >> On Jul 7, 2019, at 4:39 PM, Mike Belopuhov wrote: > > Aaron Bieber writes: > >> Hi, >> >> Adding a rule similar to the below causes a panic on -current (OpenBSD >> 6.5-current (GENERIC) #95: Thu Jul 4 21:22:25 MDT 2019). This also panics >> 6.3 >> and 6.5 (I didn't test 6.4): >> >> pass in quick on egress proto tcp from any to port once rdr-to \ >>127.0.0.1 port >> >> Once the rule is in place, fire up: >> >> nc -l 127.0.0.1 >> >> Connect a few times from a remote machine: >> >> nc >> >> Eventually it will panic with (sometimes it happens right away, other times I >> have to restart nc a few times): > > This is because it's meant to be used inside of an anchor > (it removes the rule once it's matched). > > The most sensible way to use it is to put it into the anchor > inside a recursive anchor (e.g. 'relayd/*'). > > It's possible that the check protecting the system from > the misuse like you've described here got lost during > refactoring or it never existed in the first place :-( > > Cheers, > Mike > >> ddb> trace >> pf_rm_rule(81d900a8,803bbfe8) at pf_rm_rule+0xa9 >> pf_purge_rule(803bbfe8) at pf_purge_rule+0x26 >> pf_purge(81dc1088) at pf_purge+0x55 >> taskq_thread(80022040) at taskq_thread+0x3d >> end trace frame: 0x0, count: -4 >> ddb> >> ddb> ps >> PID TID PPIDUID S FLAGS WAIT COMMAND >> 69502 12189 1 0 30x100083 ttyin ksh >> 53972 340673 1 0 30x100098 poll cron >> 81827 279222 1110 30x100090 poll sndiod >> 54852 68160 1 99 30x100090 poll sndiod >> 79474 94554 3215 95 30x100092 kqreadsmtpd >> 90212 164878 3215103 30x100092 kqreadsmtpd >> 43199 482512 3215 95 30x100092 kqreadsmtpd >> 38765 100663 3215 95 30x100092 kqreadsmtpd >> 33241 424770 3215 95 30x100092 kqreadsmtpd >>5338 193750 3215 95 30x100092 kqreadsmtpd >>3215 481909 1 0 30x100080 kqreadsmtpd >> 57742 143403 1 0 30x80 selectsshd >> 31904 460143 1 0 30x100080 poll ntpd >> 65592 182120 47006 83 30x100092 poll ntpd >> 47006 103509 1 83 30x100092 poll ntpd >> 60875 292765 99617 74 30x100092 bpf pflogd >> 99617 524148 1 0 30x80 netio pflogd >>4242 324170 49064 73 30x100090 kqreadsyslogd >> 49064 413359 1 0 30x100082 netio syslogd >> 20955 102995 68995115 30x100092 kqreadslaacd >> 99883 518930 68995115 30x100092 kqreadslaacd >> 68995 175540 1 0 30x100080 kqreadslaacd >>5278 238159 0 0 3 0x14200 pgzerozerothread >>2253 479921 0 0 3 0x14200 aiodoned aiodoned >> 98149 310276 0 0 3 0x14200 syncerupdate >> 78055 259911 0 0 3 0x14200 cleaner cleaner >> 68827 324781 0 0 3 0x14200 reaperreaper >> 93269 98863 0 0 3 0x14200 pgdaemon pagedaemon >> 75284 447451 0 0 3 0x14200 bored crynlk >> 34868 513191 0 0 3 0x14200 bored crypto >> *18776 255193 0 0 7 0x14200softnet >> 64918 469356 0 0 3 0x14200 bored systqmp >> 902 49537 0 0 3 0x14200 bored systq >> 17250 200730 0 0 3 0x40014200 bored softclock >>2990 510299 0 0 3 0x40014200idle0 >> 947 215447 0 0 3 0x14200 bored smr >> 1 180680 0 0 30x82 wait init >> 0 0 -1 0 3 0x10200 scheduler swapper >> ddb> >> >> dmesg (from a VM in vmm - I have also reproduced this on physical hw): >> OpenBSD 6.5-current (GENERIC) #95: Thu Jul 4 21:22:25 MDT 2019 >>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC >> real mem = 4278181888 (4079MB) >> avail mem = 4138524672 (3946MB) >> mpath0 at root >> scsibus0 at mpath0: 256 targets >> mainbus0 at root >> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f10 (12 entries) >> bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011 >> bios0: OpenBSD VMM >> acpi at bios0 not configured >> cpu0 at mainbus0: (uniprocessor) >> cpu0: AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx, 37466.79 MHz, 17-11-00 >> cpu0: >> FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCL
Re: Panic with pf 'once' rule
On Tue, Jul 09, 2019 at 03:35:26PM -0600, Aaron Bieber wrote: > I get the panic when the rule is in an anchor as well. > > PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A 4AF0 1F81 112D 62A9 ADCE > > >> On Jul 7, 2019, at 4:39 PM, Mike Belopuhov wrote: > > ??? > > Aaron Bieber writes: > > > >> Hi, > >> > >> Adding a rule similar to the below causes a panic on -current (OpenBSD > >> 6.5-current (GENERIC) #95: Thu Jul 4 21:22:25 MDT 2019). This also panics > >> 6.3 > >> and 6.5 (I didn't test 6.4): > >> > >> pass in quick on egress proto tcp from any to port once rdr-to \ > >>127.0.0.1 port > >> > >> Once the rule is in place, fire up: > >> > >> nc -l 127.0.0.1 > >> > >> Connect a few times from a remote machine: > >> > >> nc > >> > >> Eventually it will panic with (sometimes it happens right away, other > >> times I > >> have to restart nc a few times): > > > > This is because it's meant to be used inside of an anchor > > (it removes the rule once it's matched). > > > > The most sensible way to use it is to put it into the anchor > > inside a recursive anchor (e.g. 'relayd/*'). > > > > It's possible that the check protecting the system from > > the misuse like you've described here got lost during > > refactoring or it never existed in the first place :-( > > > > Cheers, > > Mike > > > >> ddb> trace > >> pf_rm_rule(81d900a8,803bbfe8) at pf_rm_rule+0xa9 > >> pf_purge_rule(803bbfe8) at pf_purge_rule+0x26 > >> pf_purge(81dc1088) at pf_purge+0x55 > >> taskq_thread(80022040) at taskq_thread+0x3d > >> end trace frame: 0x0, count: -4 [cc'ing sashan@ because this relates to pf.c r1.983] I can replicate the crash as well. The crash only happens if more than one connection that matches the once rule is made in a short period of time. abieber@ triggered this when he ran "nc " a few times. After going through the code, I think I found the cause and wrote a possible fix. As of pf.c r1.983, once rules are removed by the purge thread instead of immediately when a packet matches the rule. The latest code is in pf_test_rule(): if (r->rule_flag & PFRULE_ONCE) { r->rule_flag |= PFRULE_EXPIRED; r->exptime = time_second; SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle); } When a once rule is matched by a packet for the first time, the rule is marked as expired and added to pf_rule_gcl. Eventually the purge thread is run, which calls pf_purge_expired_rules() to purge the expired rules in pf_rule_gcl: while ((r = SLIST_FIRST(&pf_rule_gcl)) != NULL) { SLIST_REMOVE(&pf_rule_gcl, r, pf_rule, gcle); KASSERT(r->rule_flag & PFRULE_EXPIRED); pf_purge_rule(r); } Since the purge thread only runs periodically, there is a short time window where a once rule is expired but not yet purged. If a second new connection that matches the once rule is made in that time window, the pf_test_rule() code block is executed again, causing the same once rule to be added to pf_rule_gcl again. As a result, when the purge thread finally runs, it tries to remove the once rule multiple times, resulting in a crash, as seen on the backtrace: pf_rm_rule(81d900a8,803bbfe8) at pf_rm_rule+0xa9 pf_purge_rule(803bbfe8) at pf_purge_rule+0x26 pf_purge(81dc1088) at pf_purge+0x55 taskq_thread(80022040) at taskq_thread+0x3d There is also a related bug where pf_test_rule() still creates state for new connections that match the expired once rule in that time window, even though the once rule has already expired. In other words, the once rule is not truly a one shot rule in that expired-but-not-purged time window. To fix both bugs, I wrote this diff that adds a check in pf_test_rule() to prevent expired once rules from being added to pf_rule_gcl. The check is added "early" in pf_test_rule() to prevent any further connections from creating state if they match the expired once rule. I have sent this diff to abieber@ who confirmed that it fixes the crash for him. Thoughts? ok? Index: pf.c === RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.1084 diff -u -p -r1.1084 pf.c --- pf.c9 Jul 2019 11:30:19 - 1.1084 +++ pf.c12 Jul 2019 02:54:33 - @@ -3862,6 +3862,13 @@ pf_test_rule(struct pf_pdesc *pd, struct if (r->action == PF_DROP) goto cleanup; + /* +* If an expired "once" rule has not been purged, drop any new matching +* packets to prevent the rule from being added to pf_rule_gcl again. +*/ + if (r->rule_flag & PFRULE_EXPIRED) + goto cleanup; + pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid); if (ctx.act.rtableid >= 0 && rtable_l2(ctx.act.rtableid) != pd->rdomain)
Re: Panic with pf 'once' rule
Hello, > > To fix both bugs, I wrote this diff that adds a check in pf_test_rule() > to prevent expired once rules from being added to pf_rule_gcl. The > check is added "early" in pf_test_rule() to prevent any further > connections from creating state if they match the expired once rule. > > I have sent this diff to abieber@ who confirmed that it fixes the crash > for him. > > Thoughts? ok? > thanks for detailed analysis. it is correct. > > Index: pf.c > === > RCS file: /cvs/src/sys/net/pf.c,v > retrieving revision 1.1084 > diff -u -p -r1.1084 pf.c > --- pf.c 9 Jul 2019 11:30:19 - 1.1084 > +++ pf.c 12 Jul 2019 02:54:33 - > @@ -3862,6 +3862,13 @@ pf_test_rule(struct pf_pdesc *pd, struct > if (r->action == PF_DROP) > goto cleanup; > > + /* > + * If an expired "once" rule has not been purged, drop any new matching > + * packets to prevent the rule from being added to pf_rule_gcl again. > + */ > + if (r->rule_flag & PFRULE_EXPIRED) > + goto cleanup; > + > pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid); > if (ctx.act.rtableid >= 0 && > rtable_l2(ctx.act.rtableid) != pd->rdomain) your change looks good and makes sense. However I would ask for one more tweak to your fix: 8<---8<---8<--8< diff --git a/sys/net/pf.c b/sys/net/pf.c index 9e454e5c941..df75d7e4acf 100644 --- a/sys/net/pf.c +++ b/sys/net/pf.c @@ -3860,6 +3860,13 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, if (r->action == PF_DROP) goto cleanup; + /* +* If an expired "once" rule has not been purged, drop any new matching +* packets. +*/ + if (r->rule_flag & PFRULE_EXPIRED) + goto cleanup; + pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid); if (ctx.act.rtableid >= 0 && rtable_l2(ctx.act.rtableid) != pd->rdomain) @@ -3945,9 +3952,18 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, #endif /* NPFSYNC > 0 */ if (r->rule_flag & PFRULE_ONCE) { - r->rule_flag |= PFRULE_EXPIRED; - r->exptime = time_second; - SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle); + u_int32_t rule_flag; + + /* +* Use atomic_cas() to determine a clear winner, which will +* insert an expired rule to gcl. +*/ + rule_flag = r->rule_flag; + if (atomic_cas_uint(&r->rule_flag, rule_flag, + rule_flag | PFRULE_EXPIRED) == rule_flag) { + r->exptime = time_second; + SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle); + } } return (action); 8<---8<---8<--8< As soon as there will be more PF tasks running in parallel, we would be able to hit similar crash you are fixing now. The rules are covered by read lock, so with more PF tasks there might be two packets racing to expire at once rule at the same time. Using atomic_cas() is sufficient measure to serialize competing packets. you have my OK with/without the suggested tweak. thanks and regards sashan
Re: Panic with pf 'once' rule
On Mon, Jul 15, 2019 at 11:06:50AM +0200, Alexandr Nedvedicky wrote: > your change looks good and makes sense. However I would ask for one more > tweak to your fix: > > 8<---8<---8<--8< > diff --git a/sys/net/pf.c b/sys/net/pf.c > index 9e454e5c941..df75d7e4acf 100644 > --- a/sys/net/pf.c > +++ b/sys/net/pf.c > @@ -3860,6 +3860,13 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, > struct pf_state **sm, > if (r->action == PF_DROP) > goto cleanup; > > + /* > +* If an expired "once" rule has not been purged, drop any new > matching > +* packets. > +*/ > + if (r->rule_flag & PFRULE_EXPIRED) > + goto cleanup; > + > pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid); > if (ctx.act.rtableid >= 0 && > rtable_l2(ctx.act.rtableid) != pd->rdomain) > @@ -3945,9 +3952,18 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, > struct pf_state **sm, > #endif /* NPFSYNC > 0 */ > > if (r->rule_flag & PFRULE_ONCE) { > - r->rule_flag |= PFRULE_EXPIRED; > - r->exptime = time_second; > - SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle); > + u_int32_t rule_flag; > + > + /* > +* Use atomic_cas() to determine a clear winner, which will > +* insert an expired rule to gcl. > +*/ > + rule_flag = r->rule_flag; > + if (atomic_cas_uint(&r->rule_flag, rule_flag, > + rule_flag | PFRULE_EXPIRED) == rule_flag) { > + r->exptime = time_second; > + SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle); > + } > } > > return (action); > 8<---8<---8<--8< > > As soon as there will be more PF tasks running in parallel, we would be > able to hit similar crash you are fixing now. The rules are covered by > read lock, so with more PF tasks there might be two packets racing > to expire at once rule at the same time. Using atomic_cas() is sufficient > measure to serialize competing packets. > > you have my OK with/without the suggested tweak. Thank you! I have committed the fix with your tweak. Lawrence