Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On 19 Feb 2019, at 22:53, Andreas Longwitz wrote: Kristof Provost wrote: Because fetching a counter is a rather expansive function we should use counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr pass" rule should not cause more effort than separate "rdr" and "pass" rules. For rules with adaptive timeout values the call of counter_u64_fetch() should be accepted, but otherwise not. For a small gain in performance especially for "rdr pass" rules I suggest something like --- pf.c.orig 2019-02-18 17:49:22.944751000 +0100 +++ pf.c 2019-02-18 17:55:07.396163000 +0100 @@ -1558,7 +1558,7 @@ if (!timeout) timeout = V_pf_default_rule.timeout[state->timeout]; start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; - if (start) { + if (start && state->rule.ptr != _pf_default_rule) { end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; states = counter_u64_fetch(state->rule.ptr->states_cur); } else { I think that looks correct. Do you have any performance measurements on this? No Although presumably it only really matters in cases where there’s no explicit catch-all rule, so I do wonder if it’s worth it. Sorry, but I do not understand this argument. From manpage: The adaptive timeout values can be defined both globally and for each rule. When used on a per-rule basis, the values relate to the number of states created by the rule, otherwise to the total number of states. This handling of adaptive timeouts is done in pf_state_expires(): start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; if (start) { end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; states = counter_u64_fetch(state->rule.ptr->states_cur); } else { start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START]; end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END]; states = V_pf_status.states; } The following calculation needs three values: start, end and states. 1. Normal rules "pass .." without adaptive setting meaning "start = 0" runs in the else-section of the code snippet and therefore takes "start" and "end" from the global default settings and sets "states" to pf_status.states (= total number of states). 2. Special rules like "pass .. keep state (adaptive.start 500 adaptive.end 1000)" have start != 0, run in the if-section of the code snippet and take "start" and "end" from the rule and set "states" to the number of states created by their rule using counter_u64_fetch(). Thats all ok, but there is a third case without special handling in the above code snippet: 3. All "rdr/nat pass .." statements use together the pf_default_rule. Therefore we have "start != 0" in this case and we run the if-section of the code snippet but we better should run the else-section in this case and do not fetch the counter of the pf_default_rule but take the total number of states. Thats what the patch does. Thank you, that makes sense. I’d missed the third case. The patch is in my queue and should get committed soon. Your explanation makes a great commit message. Regards, Kristof ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Kristof Provost wrote: > > Because fetching a counter is a rather expansive function we should use > counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr > pass" rule should not cause more effort than separate "rdr" and "pass" > rules. For rules with adaptive timeout values the call of > counter_u64_fetch() should be accepted, but otherwise not. > > For a small gain in performance especially for "rdr pass" rules I > suggest something like > > --- pf.c.orig 2019-02-18 17:49:22.944751000 +0100 > +++ pf.c 2019-02-18 17:55:07.396163000 +0100 > @@ -1558,7 +1558,7 @@ > if (!timeout) >timeout = V_pf_default_rule.timeout[state->timeout]; > start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; > - if (start) { > + if (start && state->rule.ptr != _pf_default_rule) { >end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; >states = counter_u64_fetch(state->rule.ptr->states_cur); > } else { > > I think that looks correct. Do you have any performance measurements on > this? No > Although presumably it only really matters in cases where there’s no > explicit catch-all rule, so I do wonder if it’s worth it. Sorry, but I do not understand this argument. From manpage: The adaptive timeout values can be defined both globally and for each rule. When used on a per-rule basis, the values relate to the number of states created by the rule, otherwise to the total number of states. This handling of adaptive timeouts is done in pf_state_expires(): start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; if (start) { end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; states = counter_u64_fetch(state->rule.ptr->states_cur); } else { start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START]; end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END]; states = V_pf_status.states; } The following calculation needs three values: start, end and states. 1. Normal rules "pass .." without adaptive setting meaning "start = 0" runs in the else-section of the code snippet and therefore takes "start" and "end" from the global default settings and sets "states" to pf_status.states (= total number of states). 2. Special rules like "pass .. keep state (adaptive.start 500 adaptive.end 1000)" have start != 0, run in the if-section of the code snippet and take "start" and "end" from the rule and set "states" to the number of states created by their rule using counter_u64_fetch(). Thats all ok, but there is a third case without special handling in the above code snippet: 3. All "rdr/nat pass .." statements use together the pf_default_rule. Therefore we have "start != 0" in this case and we run the if-section of the code snippet but we better should run the else-section in this case and do not fetch the counter of the pf_default_rule but take the total number of states. Thats what the patch does. Regards Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On 18 Feb 2019, at 18:30, Andreas Longwitz wrote: Ok, thanks, I will commit the patch shortly. I do not see a point in waiting for two more weeks, sure report me if anything goes wrong. your patch for counter(9) on i386 definitely solves my problem discussed in this thread. Because fetching a counter is a rather expansive function we should use counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr pass" rule should not cause more effort than separate "rdr" and "pass" rules. For rules with adaptive timeout values the call of counter_u64_fetch() should be accepted, but otherwise not. For a small gain in performance especially for "rdr pass" rules I suggest something like --- pf.c.orig 2019-02-18 17:49:22.944751000 +0100 +++ pf.c2019-02-18 17:55:07.396163000 +0100 @@ -1558,7 +1558,7 @@ if (!timeout) timeout = V_pf_default_rule.timeout[state->timeout]; start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; - if (start) { + if (start && state->rule.ptr != _pf_default_rule) { end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; states = counter_u64_fetch(state->rule.ptr->states_cur); } else { I think that looks correct. Do you have any performance measurements on this? Although presumably it only really matters in cases where there’s no explicit catch-all rule, so I do wonder if it’s worth it. Regards, Kristof ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Thanks, Andreas! Kristof, will you handle that? If you are busy, I can try to refresh my memory. On Mon, Feb 18, 2019 at 06:30:32PM +0100, Andreas Longwitz wrote: A> Hello, A> A> > Ok, thanks, I will commit the patch shortly. I do not see a point in waiting A> > for two more weeks, sure report me if anything goes wrong. A> A> your patch for counter(9) on i386 definitely solves my problem discussed A> in this thread. A> A> Because fetching a counter is a rather expansive function we should use A> counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr A> pass" rule should not cause more effort than separate "rdr" and "pass" A> rules. For rules with adaptive timeout values the call of A> counter_u64_fetch() should be accepted, but otherwise not. A> A> For a small gain in performance especially for "rdr pass" rules I A> suggest something like A> A> --- pf.c.orig 2019-02-18 17:49:22.944751000 +0100 A> +++ pf.c2019-02-18 17:55:07.396163000 +0100 A> @@ -1558,7 +1558,7 @@ A> if (!timeout) A> timeout = V_pf_default_rule.timeout[state->timeout]; A> start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; A> - if (start) { A> + if (start && state->rule.ptr != _pf_default_rule) { A> end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; A> states = counter_u64_fetch(state->rule.ptr->states_cur); A> } else { A> A> -- A> Andreas A> -- Gleb Smirnoff ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Hello, > Ok, thanks, I will commit the patch shortly. I do not see a point in waiting > for two more weeks, sure report me if anything goes wrong. your patch for counter(9) on i386 definitely solves my problem discussed in this thread. Because fetching a counter is a rather expansive function we should use counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr pass" rule should not cause more effort than separate "rdr" and "pass" rules. For rules with adaptive timeout values the call of counter_u64_fetch() should be accepted, but otherwise not. For a small gain in performance especially for "rdr pass" rules I suggest something like --- pf.c.orig 2019-02-18 17:49:22.944751000 +0100 +++ pf.c2019-02-18 17:55:07.396163000 +0100 @@ -1558,7 +1558,7 @@ if (!timeout) timeout = V_pf_default_rule.timeout[state->timeout]; start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; - if (start) { + if (start && state->rule.ptr != _pf_default_rule) { end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; states = counter_u64_fetch(state->rule.ptr->states_cur); } else { -- Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On Sat, Feb 02, 2019 at 11:26:45AM +0100, Andreas Longwitz wrote: > Hello, > > > Lets switch to IPI method for fetch, similar to clear. > > I do not think that the cost of fetch is too important comparing with > > the race. > > > > diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h > > index 7fd26d2a960..278f89123a4 100644 > > --- a/sys/i386/include/counter.h > > +++ b/sys/i386/include/counter.h > > @@ -72,7 +72,12 @@ counter_64_inc_8b(uint64_t *p, int64_t inc) > > } > > > > #ifdef IN_SUBR_COUNTER_C > > -static inline uint64_t > > +struct counter_u64_fetch_cx8_arg { > > + uint64_t res; > > + uint64_t *p; > > +}; > > + > > +static uint64_t > > counter_u64_read_one_8b(uint64_t *p) > > { > > uint32_t res_lo, res_high; > > @@ -87,9 +92,22 @@ counter_u64_read_one_8b(uint64_t *p) > > return (res_lo + ((uint64_t)res_high << 32)); > > } > > > > +static void > > +counter_u64_fetch_cx8_one(void *arg1) > > +{ > > + struct counter_u64_fetch_cx8_arg *arg; > > + uint64_t val; > > + > > + arg = arg1; > > + val = counter_u64_read_one_8b((uint64_t *)((char *)arg->p + > > + UMA_PCPU_ALLOC_SIZE * PCPU_GET(cpuid))); > > + atomic_add_64(>res, val); > > +} > > + > > static inline uint64_t > > counter_u64_fetch_inline(uint64_t *p) > > { > > + struct counter_u64_fetch_cx8_arg arg; > > uint64_t res; > > int i; > > > > @@ -108,9 +126,10 @@ counter_u64_fetch_inline(uint64_t *p) > > } > > critical_exit(); > > } else { > > - CPU_FOREACH(i) > > - res += counter_u64_read_one_8b((uint64_t *)((char *)p + > > - UMA_PCPU_ALLOC_SIZE * i)); > > + arg.p = p; > > + arg.res = 0; > > + smp_rendezvous(NULL, counter_u64_fetch_cx8_one, NULL, ); > > + res = arg.res; > > } > > return (res); > > } > > > I have integrated this i386 counter(9) patch and using original pf.c to > some of my test servers and everything runs fine. Today I have added my > main firewall machine and will report in two weeks the result. I suppose > running counter_u64_fetch() in parallel to counter_u64_add() is not a > problem anymore. Ok, thanks, I will commit the patch shortly. I do not see a point in waiting for two more weeks, sure report me if anything goes wrong. ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Hello, > Lets switch to IPI method for fetch, similar to clear. > I do not think that the cost of fetch is too important comparing with > the race. > > diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h > index 7fd26d2a960..278f89123a4 100644 > --- a/sys/i386/include/counter.h > +++ b/sys/i386/include/counter.h > @@ -72,7 +72,12 @@ counter_64_inc_8b(uint64_t *p, int64_t inc) > } > > #ifdef IN_SUBR_COUNTER_C > -static inline uint64_t > +struct counter_u64_fetch_cx8_arg { > + uint64_t res; > + uint64_t *p; > +}; > + > +static uint64_t > counter_u64_read_one_8b(uint64_t *p) > { > uint32_t res_lo, res_high; > @@ -87,9 +92,22 @@ counter_u64_read_one_8b(uint64_t *p) > return (res_lo + ((uint64_t)res_high << 32)); > } > > +static void > +counter_u64_fetch_cx8_one(void *arg1) > +{ > + struct counter_u64_fetch_cx8_arg *arg; > + uint64_t val; > + > + arg = arg1; > + val = counter_u64_read_one_8b((uint64_t *)((char *)arg->p + > + UMA_PCPU_ALLOC_SIZE * PCPU_GET(cpuid))); > + atomic_add_64(>res, val); > +} > + > static inline uint64_t > counter_u64_fetch_inline(uint64_t *p) > { > + struct counter_u64_fetch_cx8_arg arg; > uint64_t res; > int i; > > @@ -108,9 +126,10 @@ counter_u64_fetch_inline(uint64_t *p) > } > critical_exit(); > } else { > - CPU_FOREACH(i) > - res += counter_u64_read_one_8b((uint64_t *)((char *)p + > - UMA_PCPU_ALLOC_SIZE * i)); > + arg.p = p; > + arg.res = 0; > + smp_rendezvous(NULL, counter_u64_fetch_cx8_one, NULL, ); > + res = arg.res; > } > return (res); > } I have integrated this i386 counter(9) patch and using original pf.c to some of my test servers and everything runs fine. Today I have added my main firewall machine and will report in two weeks the result. I suppose running counter_u64_fetch() in parallel to counter_u64_add() is not a problem anymore. Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On Thu, Jan 24, 2019 at 11:09:37PM +0100, Andreas Longwitz wrote: > >> I think the problem is the cmpxchg8b instruction used in > >> counter_u64_fetch(), because this machine instruction always writes to > >> memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX): > >> > >> TEMP64 <- DEST > >> IF (EDX:EAX = TEMP64) > >>THEN > >> ZF <- 1 > >> DEST <- ECX:EBX > >>ELSE > >> ZF <- 0 > >> EDX:EAX <- TEMP64 > >> DEST <- TEMP64 > >> FI > >> > >> If one CPU increments the counter in pf_create_state() and another does > >> the fetch, then both CPU's may run the xmpxschg8b at once with the > >> chance that both read the same memory value in TEMP64 and the fetching > >> CPU is the second CPU that writes and so the increment is lossed. Thats > >> what I see without the above patch two or three times a week. > > > > Please try the following patch. The idea is to make the value to compare > > with unlikely to be equal to the memory content, for fetch_one(). > > During my research I first had the same idea, but it did not work. In > the actual coding eax/edx is not well defined before cmpxchg8b is > executed, but it does not help for the problem to do so. > > > Also it fixes a silly bug in zero_one(). > > > > diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h > > index 7fd26d2a960..aa20831ba18 100644 > > --- a/sys/i386/include/counter.h > > +++ b/sys/i386/include/counter.h > > @@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p) > > uint32_t res_lo, res_high; > > > > __asm __volatile( > > + "movl (%0),%%eax\n\t" > > + "movl 4(%0),%%edx\n\t" > > + "addl $0x1000,%%edx\n\t" /* avoid write */ > > "movl %%eax,%%ebx\n\t" > > "movl %%edx,%%ecx\n\t" > > "cmpxchg8b (%2)" > > We can not avoid the write done by cmpxchg8b as can be seen from the > microcode given above, we always end up with "DEST <- TEMP". From the > Intel instruction reference manual: > > The destination operand is written back if the comparision fails. (The > processor never produces a locked read without also producing a locked > write). I see, AMD APM is more clear there, stating that the instruction always do rmw regardless of lock prefix. > > Maybe it is enough to prefix the cmpxchg8b with LOCK only in function > counter_u64_read_one_8b(). I am not sure. Lets switch to IPI method for fetch, similar to clear. I do not think that the cost of fetch is too important comparing with the race. > > > > @@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p) > > { > > __asm __volatile( > > +"\n1:\n\t" > > "movl (%0),%%eax\n\t" > > - "movl 4(%0),%%edx\n" > > + "movl 4(%0),%%edx\n\t" > > "xorl %%ebx,%%ebx\n\t" > > "xorl %%ecx,%%ecx\n\t" > > -"1:\n\t" > > "cmpxchg8b (%0)\n\t" > > "jnz1b" > > : > > If jnz jumps back the instruction cmpxchg8b has load registers eax/edx > with (%0), therefor I do not understand the silly bug. Ignore me. diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h index 7fd26d2a960..278f89123a4 100644 --- a/sys/i386/include/counter.h +++ b/sys/i386/include/counter.h @@ -72,7 +72,12 @@ counter_64_inc_8b(uint64_t *p, int64_t inc) } #ifdef IN_SUBR_COUNTER_C -static inline uint64_t +struct counter_u64_fetch_cx8_arg { + uint64_t res; + uint64_t *p; +}; + +static uint64_t counter_u64_read_one_8b(uint64_t *p) { uint32_t res_lo, res_high; @@ -87,9 +92,22 @@ counter_u64_read_one_8b(uint64_t *p) return (res_lo + ((uint64_t)res_high << 32)); } +static void +counter_u64_fetch_cx8_one(void *arg1) +{ + struct counter_u64_fetch_cx8_arg *arg; + uint64_t val; + + arg = arg1; + val = counter_u64_read_one_8b((uint64_t *)((char *)arg->p + + UMA_PCPU_ALLOC_SIZE * PCPU_GET(cpuid))); + atomic_add_64(>res, val); +} + static inline uint64_t counter_u64_fetch_inline(uint64_t *p) { + struct counter_u64_fetch_cx8_arg arg; uint64_t res; int i; @@ -108,9 +126,10 @@ counter_u64_fetch_inline(uint64_t *p) } critical_exit(); } else { - CPU_FOREACH(i) - res += counter_u64_read_one_8b((uint64_t *)((char *)p + - UMA_PCPU_ALLOC_SIZE * i)); + arg.p = p; + arg.res = 0; + smp_rendezvous(NULL, counter_u64_fetch_cx8_one, NULL, ); + res = arg.res; } return (res); } ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
>> I think the problem is the cmpxchg8b instruction used in >> counter_u64_fetch(), because this machine instruction always writes to >> memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX): >> >> TEMP64 <- DEST >> IF (EDX:EAX = TEMP64) >>THEN >> ZF <- 1 >> DEST <- ECX:EBX >>ELSE >> ZF <- 0 >> EDX:EAX <- TEMP64 >> DEST <- TEMP64 >> FI >> >> If one CPU increments the counter in pf_create_state() and another does >> the fetch, then both CPU's may run the xmpxschg8b at once with the >> chance that both read the same memory value in TEMP64 and the fetching >> CPU is the second CPU that writes and so the increment is lossed. Thats >> what I see without the above patch two or three times a week. > > Please try the following patch. The idea is to make the value to compare > with unlikely to be equal to the memory content, for fetch_one(). During my research I first had the same idea, but it did not work. In the actual coding eax/edx is not well defined before cmpxchg8b is executed, but it does not help for the problem to do so. > Also it fixes a silly bug in zero_one(). > > diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h > index 7fd26d2a960..aa20831ba18 100644 > --- a/sys/i386/include/counter.h > +++ b/sys/i386/include/counter.h > @@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p) > uint32_t res_lo, res_high; > > __asm __volatile( > + "movl (%0),%%eax\n\t" > + "movl 4(%0),%%edx\n\t" > + "addl $0x1000,%%edx\n\t" /* avoid write */ > "movl %%eax,%%ebx\n\t" > "movl %%edx,%%ecx\n\t" > "cmpxchg8b (%2)" We can not avoid the write done by cmpxchg8b as can be seen from the microcode given above, we always end up with "DEST <- TEMP". From the Intel instruction reference manual: The destination operand is written back if the comparision fails. (The processor never produces a locked read without also producing a locked write). Maybe it is enough to prefix the cmpxchg8b with LOCK only in function counter_u64_read_one_8b(). > @@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p) > { > __asm __volatile( > +"\n1:\n\t" > "movl (%0),%%eax\n\t" > - "movl 4(%0),%%edx\n" > + "movl 4(%0),%%edx\n\t" > "xorl %%ebx,%%ebx\n\t" > "xorl %%ecx,%%ecx\n\t" > -"1:\n\t" > "cmpxchg8b (%0)\n\t" > "jnz1b" > : If jnz jumps back the instruction cmpxchg8b has load registers eax/edx with (%0), therefor I do not understand the silly bug. Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
[Keep me in Cc:] On Thu, Jan 24, 2019 at 05:49:46PM +0100, Andreas Longwitz wrote: > after some more long term research I have an educated guess whats going > on in this problem. > > The problem only occurs on i386. > > If I replace the counter_u64_fetch() call in pf_state_expires() by > the value of V_pf_status.states, then pf works without problems, the > expire time zero problem is gone: > > > --- pf.c.1st2018-08-14 10:17:41.0 +0200 > +++ pf.c2019-01-19 17:49:18.0 +0100 > @@ -1542,7 +1542,7 @@ > start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; > if (start) { > end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; > - states = counter_u64_fetch(state->rule.ptr->states_cur); > + states = V_pf_status.states; > } else { > start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START]; > end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END]; > > The use of counter_u64_fetch() looks a little bit at random for me. For > all states not associated with the pf_default_rule the value of > pf_status.states is used and for me this value is ok for all rules. > > Further the counter(9) framework was created for quick and lockless > write of counters, but fetching is more expansive. So I suggest to let > pf_state_expires() work without a counter fetch. > > Further I can confirm that the counter states_cur of the pf_default_rule > remains correct, when the patch given above is active. Without the patch > the counter on my main firewall machine gets slowly negative. I have > verified this with a lot of live DTrace and kgdb script debugging. > > >> OK, in the meantime I did some more research and I am now quite sure the > >> problem with the bogus pf_default_rule->states_cur counter is not a > >> problem in pf. I am convinced it is a problem in counter(9) on i386 > >> server. The critical code is the machine instruction cmpxchg8b used in > >> /sys/i386/include/counter.h. > >> > >> From intel instruction set reference manual: > >> Zhis instruction can be used with a LOCK prefix allow the instruction to > >> be executed atomically. > >> > >> We have two other sources in kernel using cmpxchg8b: > >> /sys/i386/include/atomic.h and > >> /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > > > > A single CPU instruction is atomic by definition, with regards to the CPU. > > A preemption can not happen in a middle of instruction. What the "lock" > > prefix does is memory locking to avoid unlocked parallel access to the > > same address by different CPUs. > > > > What is special about counter(9) is that %fs:%esi always points to a > > per-CPU address, because %fs is unique for every CPU and is constant, > > so no other CPU may write to this address, so lock prefix isn't needed. > > > > Of course a true SMP i386 isn't a well tested arch, so I won't assert > > that counter(9) doesn't have bugs on this arch. However, I don't see > > lock prefix necessary here. > > I think the problem is the cmpxchg8b instruction used in > counter_u64_fetch(), because this machine instruction always writes to > memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX): > > TEMP64 <- DEST > IF (EDX:EAX = TEMP64) >THEN > ZF <- 1 > DEST <- ECX:EBX >ELSE > ZF <- 0 > EDX:EAX <- TEMP64 > DEST <- TEMP64 > FI > > If one CPU increments the counter in pf_create_state() and another does > the fetch, then both CPU's may run the xmpxschg8b at once with the > chance that both read the same memory value in TEMP64 and the fetching > CPU is the second CPU that writes and so the increment is lossed. Thats > what I see without the above patch two or three times a week. Please try the following patch. The idea is to make the value to compare with unlikely to be equal to the memory content, for fetch_one(). Also it fixes a silly bug in zero_one(). diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h index 7fd26d2a960..aa20831ba18 100644 --- a/sys/i386/include/counter.h +++ b/sys/i386/include/counter.h @@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p) uint32_t res_lo, res_high; __asm __volatile( + "movl (%0),%%eax\n\t" + "movl 4(%0),%%edx\n\t" + "addl $0x1000,%%edx\n\t" /* avoid write */ "movl %%eax,%%ebx\n\t" "movl %%edx,%%ecx\n\t" "cmpxchg8b (%2)" @@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p) { __asm __volatile( +"\n1:\n\t" "movl (%0),%%eax\n\t" - "movl 4(%0),%%edx\n" + "movl 4(%0),%%edx\n\t" "xorl %%ebx,%%ebx\n\t" "xorl %%ecx,%%ecx\n\t" -"1:\n\t" "cmpxchg8b (%0)\n\t" "jnz1b" : ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
after some more long term research I have an educated guess whats going on in this problem. The problem only occurs on i386. If I replace the counter_u64_fetch() call in pf_state_expires() by the value of V_pf_status.states, then pf works without problems, the expire time zero problem is gone: --- pf.c.1st2018-08-14 10:17:41.0 +0200 +++ pf.c2019-01-19 17:49:18.0 +0100 @@ -1542,7 +1542,7 @@ start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; if (start) { end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; - states = counter_u64_fetch(state->rule.ptr->states_cur); + states = V_pf_status.states; } else { start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START]; end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END]; The use of counter_u64_fetch() looks a little bit at random for me. For all states not associated with the pf_default_rule the value of pf_status.states is used and for me this value is ok for all rules. Further the counter(9) framework was created for quick and lockless write of counters, but fetching is more expansive. So I suggest to let pf_state_expires() work without a counter fetch. Further I can confirm that the counter states_cur of the pf_default_rule remains correct, when the patch given above is active. Without the patch the counter on my main firewall machine gets slowly negative. I have verified this with a lot of live DTrace and kgdb script debugging. >> OK, in the meantime I did some more research and I am now quite sure the >> problem with the bogus pf_default_rule->states_cur counter is not a >> problem in pf. I am convinced it is a problem in counter(9) on i386 >> server. The critical code is the machine instruction cmpxchg8b used in >> /sys/i386/include/counter.h. >> >> From intel instruction set reference manual: >> Zhis instruction can be used with a LOCK prefix allow the instruction to >> be executed atomically. >> >> We have two other sources in kernel using cmpxchg8b: >> /sys/i386/include/atomic.h and >> /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > > A single CPU instruction is atomic by definition, with regards to the CPU. > A preemption can not happen in a middle of instruction. What the "lock" > prefix does is memory locking to avoid unlocked parallel access to the > same address by different CPUs. > > What is special about counter(9) is that %fs:%esi always points to a > per-CPU address, because %fs is unique for every CPU and is constant, > so no other CPU may write to this address, so lock prefix isn't needed. > > Of course a true SMP i386 isn't a well tested arch, so I won't assert > that counter(9) doesn't have bugs on this arch. However, I don't see > lock prefix necessary here. I think the problem is the cmpxchg8b instruction used in counter_u64_fetch(), because this machine instruction always writes to memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX): TEMP64 <- DEST IF (EDX:EAX = TEMP64) THEN ZF <- 1 DEST <- ECX:EBX ELSE ZF <- 0 EDX:EAX <- TEMP64 DEST <- TEMP64 FI If one CPU increments the counter in pf_create_state() and another does the fetch, then both CPU's may run the xmpxschg8b at once with the chance that both read the same memory value in TEMP64 and the fetching CPU is the second CPU that writes and so the increment is lossed. Thats what I see without the above patch two or three times a week. Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Thank you all for explanation how counter(9) works in detail. > A single CPU instruction is atomic by definition, with regards to the CPU. > A preemption can not happen in a middle of instruction. What the "lock" > prefix does is memory locking to avoid unlocked parallel access to the > same address by different CPUs. OK, my view of "atomic" in this context was wrong. > No, it does not look correct. The only atomicity guarantee that is required > from the counter.h inc and zero methods are atomicity WRT context switches. > The instructions are always executed on the CPU which owns the PCPU element > in the counter array, and since the update is executed as single instruction, > it does not require more expensive cache line lock AKA LOCK prefix. This > is the main feature of the counters on x86. > > It might read bogus value when fetching the counter but counter.h KPI only > guarantee is that the readouts are mostly correct. If you have systematically > wrong value always read, there is probably something different going on. On one of my two failing servers I have eliminated all "rdr pass" rules, so counter(9) is not used at the moment for pf_default_rule.states_cur. Using DTrace I can see the negative value -49 for this counter: CPU IDFUNCTION:NAME 3 1 :BEGIN feature=bfebfbff, ncpus=4 pf_default_rule.states_cur=0xc82cb3c8 0xc82cb3c8: counter0=0x007bd25b 0xc82cb7c8: counter1=0xffd32262 0xc82cbbc8: counter2=0xffd87de1 0xc82cbfc8: counter3=0xffd88d31 counter =0xffcf On my other concerned server I have introduces a panic call as soon as the counter value returned by counter_u64_fetch() in pf_state_expires() will become negative. So I will wait for the panic and hope for more information from the kerneldump. There is one unusual configuration on the two servers: they use pf and ipfw/ipdivert at once. The reason for this is my use of natd for incoming ftp requests to my ftp server, pf handles all the other traffic. This configuration is a little bit tricky but works correct for many years. One exception was recently a problem with a buggy remote ftp client I had to debug. During this period I had to restart/reload ipfw and natd a couple of times. Because pf also has a reference to ipdivert, perhaps there is a hidden interaction with the expire problem of pf. Annotation: the buggy ftp client revealed a problem in natd (PR 230755). Regards, Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On Tue, Nov 13, 2018 at 11:17:47PM +0100, Kristof Provost wrote: > On 13 Nov 2018, at 22:01, Andreas Longwitz wrote: > >> > >> Are there any hints why the counter pf_default_rule->states_cur > >> could get a negative value ? > >> > >> I’m afraid I have no idea right now. > >> > > > > OK, in the meantime I did some more research and I am now quite sure > > the > > problem with the bogus pf_default_rule->states_cur counter is not a > > problem in pf. I am convinced it is a problem in counter(9) on i386 > > server. The critical code is the machine instruction cmpxchg8b used in > > /sys/i386/include/counter.h. > > > I’m always happy to hear problems aren’t my fault :) > > >> From intel instruction set reference manual: > > Zhis instruction can be used with a LOCK prefix allow the instruction > > to > > be executed atomically. > > > > We have two other sources in kernel using cmpxchg8b: > > /sys/i386/include/atomic.h and > > /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > > > > Both make use of the LOCK feature, in atomic.h a detailed explanation > > is > > given. Because counter.h lacks the LOCK prefix I propose the following > > patch to get around the leak: > > > > --- counter.h.orig 2015-07-03 16:45:36.0 +0200 > > +++ counter.h 2018-11-13 16:07:20.329053000 +0100 > > @@ -60,6 +60,7 @@ > > "movl %%edx,%%ecx\n\t" > > "addl (%%edi),%%ebx\n\t" > > "adcl 4(%%edi),%%ecx\n\t" > > + "lock \n\t" > > "cmpxchg8b %%fs:(%%esi)\n\t" > > "jnz1b" > > : > > @@ -76,6 +77,7 @@ > > __asm __volatile( > > "movl %%eax,%%ebx\n\t" > > "movl %%edx,%%ecx\n\t" > > + "lock \n\t" > > "cmpxchg8b (%2)" > > : "=a" (res_lo), "=d"(res_high) > > : "SD" (p) > > @@ -121,6 +123,7 @@ > > "xorl %%ebx,%%ebx\n\t" > > "xorl %%ecx,%%ecx\n\t" > > "1:\n\t" > > + "lock \n\t" > > "cmpxchg8b (%0)\n\t" > > "jnz1b" > > : > > > That looks very plausible. I’m somewhat out of my depth here, so I’d > like the authors of the counter code to take a look at it. No, it does not look correct. The only atomicity guarantee that is required from the counter.h inc and zero methods are atomicity WRT context switches. The instructions are always executed on the CPU which owns the PCPU element in the counter array, and since the update is executed as single instruction, it does not require more expensive cache line lock AKA LOCK prefix. This is the main feature of the counters on x86. It might read bogus value when fetching the counter but counter.h KPI only guarantee is that the readouts are mostly correct. If you have systematically wrong value always read, there is probably something different going on. ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On Tue, Nov 13, 2018 at 10:01:14PM +0100, Andreas Longwitz wrote: A> OK, in the meantime I did some more research and I am now quite sure the A> problem with the bogus pf_default_rule->states_cur counter is not a A> problem in pf. I am convinced it is a problem in counter(9) on i386 A> server. The critical code is the machine instruction cmpxchg8b used in A> /sys/i386/include/counter.h. A> A> From intel instruction set reference manual: A> Zhis instruction can be used with a LOCK prefix allow the instruction to A> be executed atomically. A> A> We have two other sources in kernel using cmpxchg8b: A> /sys/i386/include/atomic.h and A> /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S A single CPU instruction is atomic by definition, with regards to the CPU. A preemption can not happen in a middle of instruction. What the "lock" prefix does is memory locking to avoid unlocked parallel access to the same address by different CPUs. What is special about counter(9) is that %fs:%esi always points to a per-CPU address, because %fs is unique for every CPU and is constant, so no other CPU may write to this address, so lock prefix isn't needed. Of course a true SMP i386 isn't a well tested arch, so I won't assert that counter(9) doesn't have bugs on this arch. However, I don't see lock prefix necessary here. -- Gleb Smirnoff ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On 13 Nov 2018, at 22:01, Andreas Longwitz wrote: Are there any hints why the counter pf_default_rule->states_cur could get a negative value ? I’m afraid I have no idea right now. OK, in the meantime I did some more research and I am now quite sure the problem with the bogus pf_default_rule->states_cur counter is not a problem in pf. I am convinced it is a problem in counter(9) on i386 server. The critical code is the machine instruction cmpxchg8b used in /sys/i386/include/counter.h. I’m always happy to hear problems aren’t my fault :) From intel instruction set reference manual: Zhis instruction can be used with a LOCK prefix allow the instruction to be executed atomically. We have two other sources in kernel using cmpxchg8b: /sys/i386/include/atomic.h and /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S Both make use of the LOCK feature, in atomic.h a detailed explanation is given. Because counter.h lacks the LOCK prefix I propose the following patch to get around the leak: --- counter.h.orig 2015-07-03 16:45:36.0 +0200 +++ counter.h 2018-11-13 16:07:20.329053000 +0100 @@ -60,6 +60,7 @@ "movl %%edx,%%ecx\n\t" "addl (%%edi),%%ebx\n\t" "adcl 4(%%edi),%%ecx\n\t" + "lock \n\t" "cmpxchg8b %%fs:(%%esi)\n\t" "jnz1b" : @@ -76,6 +77,7 @@ __asm __volatile( "movl %%eax,%%ebx\n\t" "movl %%edx,%%ecx\n\t" + "lock \n\t" "cmpxchg8b (%2)" : "=a" (res_lo), "=d"(res_high) : "SD" (p) @@ -121,6 +123,7 @@ "xorl %%ebx,%%ebx\n\t" "xorl %%ecx,%%ecx\n\t" "1:\n\t" + "lock \n\t" "cmpxchg8b (%0)\n\t" "jnz1b" : That looks very plausible. I’m somewhat out of my depth here, so I’d like the authors of the counter code to take a look at it. Best regards, Kristof ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
> > Are there any hints why the counter pf_default_rule->states_cur > could get a negative value ? > > I’m afraid I have no idea right now. > OK, in the meantime I did some more research and I am now quite sure the problem with the bogus pf_default_rule->states_cur counter is not a problem in pf. I am convinced it is a problem in counter(9) on i386 server. The critical code is the machine instruction cmpxchg8b used in /sys/i386/include/counter.h. From intel instruction set reference manual: Zhis instruction can be used with a LOCK prefix allow the instruction to be executed atomically. We have two other sources in kernel using cmpxchg8b: /sys/i386/include/atomic.h and /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S Both make use of the LOCK feature, in atomic.h a detailed explanation is given. Because counter.h lacks the LOCK prefix I propose the following patch to get around the leak: --- counter.h.orig 2015-07-03 16:45:36.0 +0200 +++ counter.h 2018-11-13 16:07:20.329053000 +0100 @@ -60,6 +60,7 @@ "movl %%edx,%%ecx\n\t" "addl (%%edi),%%ebx\n\t" "adcl 4(%%edi),%%ecx\n\t" + "lock \n\t" "cmpxchg8b %%fs:(%%esi)\n\t" "jnz1b" : @@ -76,6 +77,7 @@ __asm __volatile( "movl %%eax,%%ebx\n\t" "movl %%edx,%%ecx\n\t" + "lock \n\t" "cmpxchg8b (%2)" : "=a" (res_lo), "=d"(res_high) : "SD" (p) @@ -121,6 +123,7 @@ "xorl %%ebx,%%ebx\n\t" "xorl %%ecx,%%ecx\n\t" "1:\n\t" + "lock \n\t" "cmpxchg8b (%0)\n\t" "jnz1b" : Kindly regards, Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
> In the problem I have reported for states of "rdr pass" rules I see > start=6000, end=12000, timeout=86400 and (obviously erroneous, probably > negative) states=0xffd0. > > I have no idea how that can happen. Just to make sure I understand: you > know that states is negative here because of a printf() or SDT addition > in pf_expire_states(), right? I did not change the kernel, I use DTrace on my firewall server fwextern. In pf.conf I have changed all productive "rdr pass" rules to a rdr rule and an extra filter rule. Now only one "rdr pass" rule is left for test: rdr pass on $if_internet proto tcp from 31.17.172.227 to $ip_internet port 8022 -> 10.0.0.254 Now I start the following DTrace script pfcounter.d, which will be active when a SYN on port 8022 arrives: #!/usr/sbin/dtrace -s fbt::pf_normalize_tcp:entry /((*(args[2]->m_hdr.mh_data + 33)) & 0x02) == 0x02 && htons(*(short *)(args[2]->m_hdr.mh_data + 22)) == 8022/ /* SYN + port 8022 */ { self->flag1 = 1; } fbt::pf_test:return /self->flag1/ { self->flag1 = 0; } fbt::pfioctl:entry /args[1] == 3221767193 && ((struct pfioc_states *)args[2])->ps_len != 0/ /* DIOCGETSTATES && len != 0 */ { self->flag2 = 1; } fbt::counter_u64_fetch:entry /self->flag2/ { } fbt::counter_u64_fetch:return /self->flag2/ { printf("returncode (states_cur)=%d / 0x%x", args[1], args[1]); } fbt::pfioctl:return /self->flag2/ { self->flag2 = 0; } Now I run on my remote test client (IP 31.17.172.227) the command ssh -p 8022 fwextern sleep 20 This creates on fwextern a state for the "rdr pass" rule with expire time zero. I must be quick to run "pfctl -vss" on fwextern to see this state and the output of the DTrace script shows me the "negative" value of the counter: === root@fwextern (pts/0) -> ./pfcounter.d dtrace: script './pfcounter.d' matched 6 probes CPU IDFUNCTION:NAME 3 17624 counter_u64_fetch:entry 3 17625 counter_u64_fetch:return returncode (states_cur)=4294967248 / 0xffd0 If I run on the test client the ssh command twice, then the counter is one less negative than before: === root@fwextern (pts/0) -> ./pfcounter.d dtrace: script './pfcounter.d' matched 6 probes CPU IDFUNCTION:NAME 3 17624 counter_u64_fetch:entry 3 17625 counter_u64_fetch:return returncode (states_cur)=4294967249 / 0xffd1 3 17624 counter_u64_fetch:entry 3 17625 counter_u64_fetch:return returncode (states_cur)=4294967249 / 0xffd1 Because of "sleep 20" the ssh command does not return and must be killed. I have observed the problem on two of my firewall servers, the pf rules never were reloaded since boot. I think there must be an unknown event in the past, that triggered the negative counter value. I will try to add a statement to the kernel that recognizes the problem and go back to the "rdr pass" rules, so next time the problem occurres we have more information than now. Kindly regards, Andreas ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On 27 Oct 2018, at 5:22, Andreas Longwitz wrote: Thanks very much for answer especially for the hint to openbsd. I wonder if there’s an integer overflow in the of_state_expires() calculation. The OpenBSD people have a cast to u_int64_t in their version: |timeout = (u_int64_t)timeout * (end - states) / (end - start); | Perhaps this would fix your problem? (Untested, not even compiled) |if (end && states > start && start < end) { if (states < end) { timeout = (uint64_t)timeout * (end - states) / (end - start); return (state->expire + timeout; } else return (time_uptime); } return (state->expire + timeout); I can confirm the patch of the openbsd people adding the uint64_t cast makes sense. If timeout * (end - states) becomes bigger than UINT32_MAX (I am on i386) the cast prevents the overflow of this product and the result of the adaptive calculation will always be correct. Example: start=6000, end=12000, timeout=86400 * 5 (5 days), states=100 result 140972, result with cast patch 856800. In the problem I have reported for states of "rdr pass" rules I see start=6000, end=12000, timeout=86400 and (obviously erroneous, probably negative) states=0xffd0. I have no idea how that can happen. Just to make sure I understand: you know that states is negative here because of a printf() or SDT addition in pf_expire_states(), right? Further the counter variable for states_cur of pf_default_rule is used für all "rdr/nat/binat pass" rules together. This was a little bit suprising for me, but I think this is intended behaviour. Correct ? Yes. Are there any hints why the counter pf_default_rule->states_cur could get a negative value ? I’m afraid I have no idea right now. Best regards, Kristof ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Thanks very much for answer especially for the hint to openbsd. > I wonder if there’s an integer overflow in the of_state_expires() > calculation. > The OpenBSD people have a cast to u_int64_t in their version: > > |timeout = (u_int64_t)timeout * (end - states) / (end - start); > | > > Perhaps this would fix your problem? (Untested, not even compiled) > > |if (end && states > start && start < end) { > if (states < end) { > timeout = (uint64_t)timeout * (end - states) / (end - > start); > return (state->expire + timeout; > } > else > return (time_uptime); > } > return (state->expire + timeout); I can confirm the patch of the openbsd people adding the uint64_t cast makes sense. If timeout * (end - states) becomes bigger than UINT32_MAX (I am on i386) the cast prevents the overflow of this product and the result of the adaptive calculation will always be correct. Example: start=6000, end=12000, timeout=86400 * 5 (5 days), states=100 result 140972, result with cast patch 856800. In the problem I have reported for states of "rdr pass" rules I see start=6000, end=12000, timeout=86400 and (obviously erroneous, probably negative) states=0xffd0. Therefore the condition "states < end" is false and instead of doing the adaptive calculation the function pf_states_expires() returns time_update. In the code snippet start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START]; if (start) { end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END]; states = counter_u64_fetch(state->rule.ptr->states_cur); } else { start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START]; end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END]; states = V_pf_status.states; } I see start=6000, so the variable states with value 0xffd0 is hold in a counter variable. Based on these facts there are two questions (please notice I have not any adaptive statements in my pf.conf): 1. Why I see start=6000 for states of "rdr pass" rules ? For all states of filter rules I see always start=0 and therefore a counter variable is never used. 2, The counter variable states_cur is changed exclusiv by the macros STATE_INC/DEC_COUNTERS, why one of them can be negative ? The answer of question 1 I can give myself: All states of filter rules include a pointer to the underlying rule which includes counter variables too. A state of a "rdr pass" rule does not have such an underlying rule, so the global pf_default_rule is used. That means in the first line of the code snippet above we have state->rule.ptr == pf_default_rule and the value of timeout[PFTM_ADAPTIVE_END] indeed is initialized with 6000. Further the counter variable for states_cur of pf_default_rule is used für all "rdr/nat/binat pass" rules together. This was a little bit suprising for me, but I think this is intended behaviour. Correct ? I still don't have an answer for question 2. My problem is best described in the log of commit 263029: Once pf became not covered by a single mutex, many counters in it became race prone. Some just gather statistics, but some are later used in different calculations. A real problem was the race provoked underflow of the states_cur counter on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this value is used in pf_state_expires() and any state created by this rule is immediately expired. Thus, make fields states_cur, states_tot and src_nodes of struct pf_rule be counter(9)s. I run FreeBSD 10 r338093 and of course have this and the following commits included. Are there any hints why the counter pf_default_rule->states_cur could get a negative value ? --- Andreas Longwitz ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On 15 Oct 2018, at 15:26, Andreas Longwitz wrote: On two of my FreeBSD 10 (r338093) firewall servers some incoming ssh connections stopped to work because pf started to create states with expire time zero (instead of 86400 sec) for translation statements like rdr pass on em0 proto tcp from any to myip port 8022 --> 10.0.0.254 Therefore a command given on a remote server like ssh -p 8022 myip sleep 15 did not return, because the created state for the connection was purged by pf (interval 10 sec) before 15 seconds. If I replace the rdr pass rule with a rdr rule and a separate filter pass rule then the created state always has expire time 24h and everything is ok. I have tried to analyze the bug in the rdr pass rule. Immediately after starting the above ssh command on the remote sever I see with pfctl -vss the sreated state on my firewall machine: all tcp 10.0.0.254:8022 (myip:8022) <- remoteip:59233 ESTABLISHED:ESTABLISHED [1443872812 + 66608] wscale 6 [1365307391 + 66560] wscale 3 age 00:00:00, expires in 00:00:00, 13:12 pkts, 2955:3306 bytes and a DTrace script running at the same time gives 3 19323pfsync_state_export:entry creation=4491391. expire=4491391, state_timeout=2 3 16459 pf_state_expires:entry state_timeout=86400, start_timeout=6000, end_timeout=12000 states=882 3 17624 counter_u64_fetch:entry 3 17625 counter_u64_fetch:return returncode (states_cur)=4294967248 = 0xffd0 3 16460 pf_state_expires:return returncode=4491391 3 19324 pfsync_state_export:return creation=0. expire=0, syncstate_timeout=2 0 12730 pfioctl:return returncode=0, time_uptime=4491391 So pf_state_expires() returns the value time_update and therefore pfsync_state_export() gives expire zero. Reason for this is the very big (means negative) value returned by the function counter_u64_fetch() for states_cur. This variable is of type uint64_t and only incremented and decremented by the macros STATE_INC_COUNTERS and STATE_DEC_COUNTERS. I wonder if there’s an integer overflow in the of_state_expires() calculation. The OpenBSD people have a cast to u_int64_t in their version: timeout = (u_int64_t)timeout * (end - states) / (end - start); Perhaps this would fix your problem? (Untested, not even compiled) if (end && states > start && start < end) { if (states < end) { timeout = (uint64_t)timeout * (end - states) / (end - start); return (state->expire + timeout; } else return (time_uptime); } return (state->expire + timeout); } Best regards, Kristof ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
On two of my FreeBSD 10 (r338093) firewall servers some incoming ssh connections stopped to work because pf started to create states with expire time zero (instead of 86400 sec) for translation statements like rdr pass on em0 proto tcp from any to myip port 8022 --> 10.0.0.254 Therefore a command given on a remote server like ssh -p 8022 myip sleep 15 did not return, because the created state for the connection was purged by pf (interval 10 sec) before 15 seconds. If I replace the rdr pass rule with a rdr rule and a separate filter pass rule then the created state always has expire time 24h and everything is ok. I have tried to analyze the bug in the rdr pass rule. Immediately after starting the above ssh command on the remote sever I see with pfctl -vss the sreated state on my firewall machine: all tcp 10.0.0.254:8022 (myip:8022) <- remoteip:59233 ESTABLISHED:ESTABLISHED [1443872812 + 66608] wscale 6 [1365307391 + 66560] wscale 3 age 00:00:00, expires in 00:00:00, 13:12 pkts, 2955:3306 bytes and a DTrace script running at the same time gives 3 19323pfsync_state_export:entry creation=4491391. expire=4491391, state_timeout=2 3 16459 pf_state_expires:entry state_timeout=86400, start_timeout=6000, end_timeout=12000 states=882 3 17624 counter_u64_fetch:entry 3 17625 counter_u64_fetch:return returncode (states_cur)=4294967248 = 0xffd0 3 16460 pf_state_expires:return returncode=4491391 3 19324 pfsync_state_export:return creation=0. expire=0, syncstate_timeout=2 0 12730 pfioctl:return returncode=0, time_uptime=4491391 So pf_state_expires() returns the value time_update and therefore pfsync_state_export() gives expire zero. Reason for this is the very big (means negative) value returned by the function counter_u64_fetch() for states_cur. This variable is of type uint64_t and only incremented and decremented by the macros STATE_INC_COUNTERS and STATE_DEC_COUNTERS. I see two possibilities: Maybe counter(9) is broken for my hardware (i386, cpu_feature=0xbfebfbff) or STATE_DEC_COUNTERS sometimes is called without STATE_INC_COUNTERS called before. In the above example the value of states_cur is 1 on a fresh booted system, so we may have a difference of 49 in calls of the COUNTER macros. I see the same number 49 in the output of pfctl -si in the line for short packets. This may be a hint or its random. -- Andreas Longwitz ___ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"