Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-02-23 Thread Kristof Provost

On 19 Feb 2019, at 22:53, Andreas Longwitz wrote:

Kristof Provost wrote:


Because fetching a counter is a rather expansive function we 
should use
counter_u64_fetch() in pf_state_expires() only when necessary. A 
"rdr
pass" rule should not cause more effort than separate "rdr" and 
"pass"

rules. For rules with adaptive timeout values the call of
counter_u64_fetch() should be accepted, but otherwise not.

For a small gain in performance especially for "rdr pass" rules I
suggest something like

--- pf.c.orig 2019-02-18 17:49:22.944751000 +0100
+++ pf.c 2019-02-18 17:55:07.396163000 +0100
@@ -1558,7 +1558,7 @@
if (!timeout)
   timeout = V_pf_default_rule.timeout[state->timeout];
start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
  - if (start) {
  + if (start && state->rule.ptr != _pf_default_rule) {
   end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
   states = counter_u64_fetch(state->rule.ptr->states_cur);
} else {

I think that looks correct. Do you have any performance measurements 
on

this?


No

Although presumably it only really matters in cases where there’s 
no

explicit catch-all rule, so I do wonder if it’s worth it.


Sorry, but I do not understand this argument.


From manpage:

   The adaptive timeout values can be defined both globally and for
   each rule.  When used on a per-rule basis, the values relate to the
   number of states created by the rule, otherwise to the total number
   of states.

This handling of adaptive timeouts is done in pf_state_expires():

 start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
 if (start) {
 end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
 states = counter_u64_fetch(state->rule.ptr->states_cur);
 } else {
 start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START];
 end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END];
 states = V_pf_status.states;
 }

The following calculation needs three values: start, end and states.

1. Normal rules "pass .." without adaptive setting meaning "start = 0"
   runs in the else-section of the code snippet and therefore takes
   "start" and "end" from the global default settings and sets 
"states"

   to pf_status.states (= total number of states).

2. Special rules like
   "pass .. keep state (adaptive.start 500 adaptive.end 1000)"
   have start != 0, run in the if-section of the code snippet and take
   "start" and "end" from the rule and set "states" to the number of
   states created by their rule using counter_u64_fetch().

Thats all ok, but there is a third case without special handling in 
the

above code snippet:

3. All "rdr/nat pass .." statements use together the pf_default_rule.
   Therefore we have "start != 0" in this case and we run the
   if-section of the code snippet but we better should run the
   else-section in this case and do not fetch the counter of the
   pf_default_rule but take the total number of states.

Thats what the patch does.


Thank you, that makes sense. I’d missed the third case.

The patch is in my queue and should get committed soon. Your explanation 
makes a great commit message.


Regards,
Kristof
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-02-19 Thread Andreas Longwitz
Kristof Provost wrote:
> 
> Because fetching a counter is a rather expansive function we should use
> counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
> pass" rule should not cause more effort than separate "rdr" and "pass"
> rules. For rules with adaptive timeout values the call of
> counter_u64_fetch() should be accepted, but otherwise not.
> 
> For a small gain in performance especially for "rdr pass" rules I
> suggest something like
> 
> --- pf.c.orig 2019-02-18 17:49:22.944751000 +0100
> +++ pf.c 2019-02-18 17:55:07.396163000 +0100
> @@ -1558,7 +1558,7 @@
> if (!timeout)
>timeout = V_pf_default_rule.timeout[state->timeout];
> start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
>   - if (start) {
>   + if (start && state->rule.ptr != _pf_default_rule) {
>end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
>states = counter_u64_fetch(state->rule.ptr->states_cur);
> } else {
> 
> I think that looks correct. Do you have any performance measurements on
> this?

No

> Although presumably it only really matters in cases where there’s no
> explicit catch-all rule, so I do wonder if it’s worth it.

Sorry, but I do not understand this argument.

From manpage:
   The adaptive timeout values can be defined both globally and for
   each rule.  When used on a per-rule basis, the values relate to the
   number of states created by the rule, otherwise to the total number
   of states.

This handling of adaptive timeouts is done in pf_state_expires():

 start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
 if (start) {
 end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
 states = counter_u64_fetch(state->rule.ptr->states_cur);
 } else {
 start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START];
 end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END];
 states = V_pf_status.states;
 }

The following calculation needs three values: start, end and states.

1. Normal rules "pass .." without adaptive setting meaning "start = 0"
   runs in the else-section of the code snippet and therefore takes
   "start" and "end" from the global default settings and sets "states"
   to pf_status.states (= total number of states).

2. Special rules like
   "pass .. keep state (adaptive.start 500 adaptive.end 1000)"
   have start != 0, run in the if-section of the code snippet and take
   "start" and "end" from the rule and set "states" to the number of
   states created by their rule using counter_u64_fetch().

Thats all ok, but there is a third case without special handling in the
above code snippet:

3. All "rdr/nat pass .." statements use together the pf_default_rule.
   Therefore we have "start != 0" in this case and we run the
   if-section of the code snippet but we better should run the
   else-section in this case and do not fetch the counter of the
   pf_default_rule but take the total number of states.

Thats what the patch does.


Regards
Andreas
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-02-18 Thread Kristof Provost

On 18 Feb 2019, at 18:30, Andreas Longwitz wrote:
Ok, thanks, I will commit the patch shortly.  I do not see a point in 
waiting

for two more weeks, sure report me if anything goes wrong.


your patch for counter(9) on i386 definitely solves my problem 
discussed

in this thread.

Because fetching a counter is a rather expansive function we should 
use

counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
pass" rule should not cause more effort than separate "rdr" and "pass"
rules. For rules with adaptive timeout values the call of
counter_u64_fetch() should be accepted, but otherwise not.

For a small gain in performance especially for "rdr pass" rules I
suggest something like

--- pf.c.orig   2019-02-18 17:49:22.944751000 +0100
+++ pf.c2019-02-18 17:55:07.396163000 +0100
@@ -1558,7 +1558,7 @@
if (!timeout)
timeout = V_pf_default_rule.timeout[state->timeout];
start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
-   if (start) {
+   if (start && state->rule.ptr != _pf_default_rule) {
end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
states = 
counter_u64_fetch(state->rule.ptr->states_cur);

} else {

I think that looks correct. Do you have any performance measurements on 
this?


Although presumably it only really matters in cases where there’s no 
explicit catch-all rule, so I do wonder if it’s worth it.


Regards,
Kristof
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-02-18 Thread Gleb Smirnoff
  Thanks, Andreas!

Kristof, will you handle that? If you are busy, I can try to refresh
my memory.

On Mon, Feb 18, 2019 at 06:30:32PM +0100, Andreas Longwitz wrote:
A> Hello,
A> 
A> > Ok, thanks, I will commit the patch shortly.  I do not see a point in 
waiting
A> > for two more weeks, sure report me if anything goes wrong.
A> 
A> your patch for counter(9) on i386 definitely solves my problem discussed
A> in this thread.
A> 
A> Because fetching a counter is a rather expansive function we should use
A> counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
A> pass" rule should not cause more effort than separate "rdr" and "pass"
A> rules. For rules with adaptive timeout values the call of
A> counter_u64_fetch() should be accepted, but otherwise not.
A> 
A> For a small gain in performance especially for "rdr pass" rules I
A> suggest something like
A> 
A> --- pf.c.orig   2019-02-18 17:49:22.944751000 +0100
A> +++ pf.c2019-02-18 17:55:07.396163000 +0100
A> @@ -1558,7 +1558,7 @@
A> if (!timeout)
A> timeout = V_pf_default_rule.timeout[state->timeout];
A> start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
A> -   if (start) {
A> +   if (start && state->rule.ptr != _pf_default_rule) {
A> end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
A> states = counter_u64_fetch(state->rule.ptr->states_cur);
A> } else {
A> 
A> -- 
A> Andreas
A> 

-- 
Gleb Smirnoff
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-02-18 Thread Andreas Longwitz
Hello,

> Ok, thanks, I will commit the patch shortly.  I do not see a point in waiting
> for two more weeks, sure report me if anything goes wrong.

your patch for counter(9) on i386 definitely solves my problem discussed
in this thread.

Because fetching a counter is a rather expansive function we should use
counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
pass" rule should not cause more effort than separate "rdr" and "pass"
rules. For rules with adaptive timeout values the call of
counter_u64_fetch() should be accepted, but otherwise not.

For a small gain in performance especially for "rdr pass" rules I
suggest something like

--- pf.c.orig   2019-02-18 17:49:22.944751000 +0100
+++ pf.c2019-02-18 17:55:07.396163000 +0100
@@ -1558,7 +1558,7 @@
if (!timeout)
timeout = V_pf_default_rule.timeout[state->timeout];
start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
-   if (start) {
+   if (start && state->rule.ptr != _pf_default_rule) {
end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
states = counter_u64_fetch(state->rule.ptr->states_cur);
} else {

-- 
Andreas
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-02-02 Thread Konstantin Belousov
On Sat, Feb 02, 2019 at 11:26:45AM +0100, Andreas Longwitz wrote:
> Hello,
> 
> > Lets switch to IPI method for fetch, similar to clear.
> > I do not think that the cost of fetch is too important comparing with
> > the race.
> > 
> > diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h
> > index 7fd26d2a960..278f89123a4 100644
> > --- a/sys/i386/include/counter.h
> > +++ b/sys/i386/include/counter.h
> > @@ -72,7 +72,12 @@ counter_64_inc_8b(uint64_t *p, int64_t inc)
> >  }
> >  
> >  #ifdef IN_SUBR_COUNTER_C
> > -static inline uint64_t
> > +struct counter_u64_fetch_cx8_arg {
> > +   uint64_t res;
> > +   uint64_t *p;
> > +};
> > +
> > +static uint64_t
> >  counter_u64_read_one_8b(uint64_t *p)
> >  {
> > uint32_t res_lo, res_high;
> > @@ -87,9 +92,22 @@ counter_u64_read_one_8b(uint64_t *p)
> > return (res_lo + ((uint64_t)res_high << 32));
> >  }
> >  
> > +static void
> > +counter_u64_fetch_cx8_one(void *arg1)
> > +{
> > +   struct counter_u64_fetch_cx8_arg *arg;
> > +   uint64_t val;
> > +
> > +   arg = arg1;
> > +   val = counter_u64_read_one_8b((uint64_t *)((char *)arg->p +
> > +   UMA_PCPU_ALLOC_SIZE * PCPU_GET(cpuid)));
> > +   atomic_add_64(>res, val);
> > +}
> > +
> >  static inline uint64_t
> >  counter_u64_fetch_inline(uint64_t *p)
> >  {
> > +   struct counter_u64_fetch_cx8_arg arg;
> > uint64_t res;
> > int i;
> >  
> > @@ -108,9 +126,10 @@ counter_u64_fetch_inline(uint64_t *p)
> > }
> > critical_exit();
> > } else {
> > -   CPU_FOREACH(i)
> > -   res += counter_u64_read_one_8b((uint64_t *)((char *)p +
> > -   UMA_PCPU_ALLOC_SIZE * i));
> > +   arg.p = p;
> > +   arg.res = 0;
> > +   smp_rendezvous(NULL, counter_u64_fetch_cx8_one, NULL, );
> > +   res = arg.res;
> > }
> > return (res);
> >  }
> 
> 
> I have integrated this i386 counter(9) patch and using original pf.c to
> some of my test servers and everything runs fine. Today I have added my
> main firewall machine and will report in two weeks the result. I suppose
> running counter_u64_fetch() in parallel to counter_u64_add() is not a
> problem anymore.
Ok, thanks, I will commit the patch shortly.  I do not see a point in waiting
for two more weeks, sure report me if anything goes wrong.
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-02-02 Thread Andreas Longwitz
Hello,

> Lets switch to IPI method for fetch, similar to clear.
> I do not think that the cost of fetch is too important comparing with
> the race.
> 
> diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h
> index 7fd26d2a960..278f89123a4 100644
> --- a/sys/i386/include/counter.h
> +++ b/sys/i386/include/counter.h
> @@ -72,7 +72,12 @@ counter_64_inc_8b(uint64_t *p, int64_t inc)
>  }
>  
>  #ifdef IN_SUBR_COUNTER_C
> -static inline uint64_t
> +struct counter_u64_fetch_cx8_arg {
> + uint64_t res;
> + uint64_t *p;
> +};
> +
> +static uint64_t
>  counter_u64_read_one_8b(uint64_t *p)
>  {
>   uint32_t res_lo, res_high;
> @@ -87,9 +92,22 @@ counter_u64_read_one_8b(uint64_t *p)
>   return (res_lo + ((uint64_t)res_high << 32));
>  }
>  
> +static void
> +counter_u64_fetch_cx8_one(void *arg1)
> +{
> + struct counter_u64_fetch_cx8_arg *arg;
> + uint64_t val;
> +
> + arg = arg1;
> + val = counter_u64_read_one_8b((uint64_t *)((char *)arg->p +
> + UMA_PCPU_ALLOC_SIZE * PCPU_GET(cpuid)));
> + atomic_add_64(>res, val);
> +}
> +
>  static inline uint64_t
>  counter_u64_fetch_inline(uint64_t *p)
>  {
> + struct counter_u64_fetch_cx8_arg arg;
>   uint64_t res;
>   int i;
>  
> @@ -108,9 +126,10 @@ counter_u64_fetch_inline(uint64_t *p)
>   }
>   critical_exit();
>   } else {
> - CPU_FOREACH(i)
> - res += counter_u64_read_one_8b((uint64_t *)((char *)p +
> - UMA_PCPU_ALLOC_SIZE * i));
> + arg.p = p;
> + arg.res = 0;
> + smp_rendezvous(NULL, counter_u64_fetch_cx8_one, NULL, );
> + res = arg.res;
>   }
>   return (res);
>  }


I have integrated this i386 counter(9) patch and using original pf.c to
some of my test servers and everything runs fine. Today I have added my
main firewall machine and will report in two weeks the result. I suppose
running counter_u64_fetch() in parallel to counter_u64_add() is not a
problem anymore.

Andreas
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-01-25 Thread Konstantin Belousov
On Thu, Jan 24, 2019 at 11:09:37PM +0100, Andreas Longwitz wrote:
> >> I think the problem is the cmpxchg8b instruction used in
> >> counter_u64_fetch(), because this machine instruction always writes to
> >> memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX):
> >>
> >> TEMP64 <- DEST
> >> IF (EDX:EAX = TEMP64)
> >>THEN
> >>   ZF <- 1
> >>   DEST <- ECX:EBX
> >>ELSE
> >>   ZF <- 0
> >>   EDX:EAX <- TEMP64
> >>   DEST <- TEMP64
> >> FI
> >>
> >> If one CPU increments the counter in pf_create_state() and another does
> >> the fetch, then both CPU's may run the xmpxschg8b at once with the
> >> chance that both read the same memory value in TEMP64 and the fetching
> >> CPU is the second CPU that writes and so the increment is lossed. Thats
> >> what I see without the above patch two or three times a week.
> > 
> > Please try the following patch.  The idea is to make the value to compare
> > with unlikely to be equal to the memory content, for fetch_one().
> 
> During my research I first had the same idea, but it did not work. In
> the actual coding eax/edx is not well defined before cmpxchg8b is
> executed, but it does not help for the problem to do so.
> 
> > Also it fixes a silly bug in zero_one().
> > 
> > diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h
> > index 7fd26d2a960..aa20831ba18 100644
> > --- a/sys/i386/include/counter.h
> > +++ b/sys/i386/include/counter.h
> > @@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p)
> > uint32_t res_lo, res_high;
> >  
> > __asm __volatile(
> > +   "movl   (%0),%%eax\n\t"
> > +   "movl   4(%0),%%edx\n\t"
> > +   "addl   $0x1000,%%edx\n\t"  /* avoid write */
> > "movl   %%eax,%%ebx\n\t"
> > "movl   %%edx,%%ecx\n\t"
> > "cmpxchg8b  (%2)"
> 
> We can not avoid the write done by cmpxchg8b as can be seen from the
> microcode given above, we always end up with "DEST <- TEMP". From the
> Intel instruction reference manual:
> 
> The destination operand is written back if the comparision fails. (The
> processor never produces a locked read without also producing a locked
> write).
I see, AMD APM is more clear there, stating that the instruction always
do rmw regardless of lock prefix.

> 
> Maybe it is enough to prefix the cmpxchg8b with LOCK only in function
> counter_u64_read_one_8b().
I am not sure.  Lets switch to IPI method for fetch, similar to clear.
I do not think that the cost of fetch is too important comparing with
the race.

> 
> 
> > @@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p)
> >  {
> > __asm __volatile(
> > +"\n1:\n\t"
> > "movl   (%0),%%eax\n\t"
> > -   "movl   4(%0),%%edx\n"
> > +   "movl   4(%0),%%edx\n\t"
> > "xorl   %%ebx,%%ebx\n\t"
> > "xorl   %%ecx,%%ecx\n\t"
> > -"1:\n\t"
> > "cmpxchg8b  (%0)\n\t"
> > "jnz1b"
> > :
> 
> If jnz jumps back the instruction cmpxchg8b has load registers eax/edx
> with (%0), therefor I do not understand the silly bug.
Ignore me.


diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h
index 7fd26d2a960..278f89123a4 100644
--- a/sys/i386/include/counter.h
+++ b/sys/i386/include/counter.h
@@ -72,7 +72,12 @@ counter_64_inc_8b(uint64_t *p, int64_t inc)
 }
 
 #ifdef IN_SUBR_COUNTER_C
-static inline uint64_t
+struct counter_u64_fetch_cx8_arg {
+   uint64_t res;
+   uint64_t *p;
+};
+
+static uint64_t
 counter_u64_read_one_8b(uint64_t *p)
 {
uint32_t res_lo, res_high;
@@ -87,9 +92,22 @@ counter_u64_read_one_8b(uint64_t *p)
return (res_lo + ((uint64_t)res_high << 32));
 }
 
+static void
+counter_u64_fetch_cx8_one(void *arg1)
+{
+   struct counter_u64_fetch_cx8_arg *arg;
+   uint64_t val;
+
+   arg = arg1;
+   val = counter_u64_read_one_8b((uint64_t *)((char *)arg->p +
+   UMA_PCPU_ALLOC_SIZE * PCPU_GET(cpuid)));
+   atomic_add_64(>res, val);
+}
+
 static inline uint64_t
 counter_u64_fetch_inline(uint64_t *p)
 {
+   struct counter_u64_fetch_cx8_arg arg;
uint64_t res;
int i;
 
@@ -108,9 +126,10 @@ counter_u64_fetch_inline(uint64_t *p)
}
critical_exit();
} else {
-   CPU_FOREACH(i)
-   res += counter_u64_read_one_8b((uint64_t *)((char *)p +
-   UMA_PCPU_ALLOC_SIZE * i));
+   arg.p = p;
+   arg.res = 0;
+   smp_rendezvous(NULL, counter_u64_fetch_cx8_one, NULL, );
+   res = arg.res;
}
return (res);
 }
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-01-24 Thread Andreas Longwitz
>> I think the problem is the cmpxchg8b instruction used in
>> counter_u64_fetch(), because this machine instruction always writes to
>> memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX):
>>
>> TEMP64 <- DEST
>> IF (EDX:EAX = TEMP64)
>>THEN
>>   ZF <- 1
>>   DEST <- ECX:EBX
>>ELSE
>>   ZF <- 0
>>   EDX:EAX <- TEMP64
>>   DEST <- TEMP64
>> FI
>>
>> If one CPU increments the counter in pf_create_state() and another does
>> the fetch, then both CPU's may run the xmpxschg8b at once with the
>> chance that both read the same memory value in TEMP64 and the fetching
>> CPU is the second CPU that writes and so the increment is lossed. Thats
>> what I see without the above patch two or three times a week.
> 
> Please try the following patch.  The idea is to make the value to compare
> with unlikely to be equal to the memory content, for fetch_one().

During my research I first had the same idea, but it did not work. In
the actual coding eax/edx is not well defined before cmpxchg8b is
executed, but it does not help for the problem to do so.

> Also it fixes a silly bug in zero_one().
> 
> diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h
> index 7fd26d2a960..aa20831ba18 100644
> --- a/sys/i386/include/counter.h
> +++ b/sys/i386/include/counter.h
> @@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p)
>   uint32_t res_lo, res_high;
>  
>   __asm __volatile(
> + "movl   (%0),%%eax\n\t"
> + "movl   4(%0),%%edx\n\t"
> + "addl   $0x1000,%%edx\n\t"  /* avoid write */
>   "movl   %%eax,%%ebx\n\t"
>   "movl   %%edx,%%ecx\n\t"
>   "cmpxchg8b  (%2)"

We can not avoid the write done by cmpxchg8b as can be seen from the
microcode given above, we always end up with "DEST <- TEMP". From the
Intel instruction reference manual:

The destination operand is written back if the comparision fails. (The
processor never produces a locked read without also producing a locked
write).

Maybe it is enough to prefix the cmpxchg8b with LOCK only in function
counter_u64_read_one_8b().


> @@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p)
>  {
>   __asm __volatile(
> +"\n1:\n\t"
>   "movl   (%0),%%eax\n\t"
> - "movl   4(%0),%%edx\n"
> + "movl   4(%0),%%edx\n\t"
>   "xorl   %%ebx,%%ebx\n\t"
>   "xorl   %%ecx,%%ecx\n\t"
> -"1:\n\t"
>   "cmpxchg8b  (%0)\n\t"
>   "jnz1b"
>   :

If jnz jumps back the instruction cmpxchg8b has load registers eax/edx
with (%0), therefor I do not understand the silly bug.


Andreas
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-01-24 Thread Konstantin Belousov
[Keep me in Cc:]
On Thu, Jan 24, 2019 at 05:49:46PM +0100, Andreas Longwitz wrote:
> after some more long term research I have an educated guess whats going
> on in this problem.
> 
> The problem only occurs on i386.
> 
> If I replace the counter_u64_fetch() call in pf_state_expires() by
> the value of V_pf_status.states, then pf works without problems, the
> expire time zero problem is gone:
> 
> 
> --- pf.c.1st2018-08-14 10:17:41.0 +0200
> +++ pf.c2019-01-19 17:49:18.0 +0100
> @@ -1542,7 +1542,7 @@
> start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
> if (start) {
> end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
> -   states = counter_u64_fetch(state->rule.ptr->states_cur);
> +   states = V_pf_status.states;
> } else {
> start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START];
> end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END];
> 
> The use of counter_u64_fetch() looks a little bit at random for me. For
> all states not associated with the pf_default_rule the value of
> pf_status.states is used and for me this value is ok for all rules.
> 
> Further the counter(9) framework was created for quick and lockless
> write of counters, but fetching is more expansive. So I suggest to let
> pf_state_expires() work without a counter fetch.
> 
> Further I can confirm that the counter states_cur of the pf_default_rule
> remains correct, when the patch given above is active. Without the patch
> the counter on my main firewall machine gets slowly negative. I have
> verified this with a lot of live DTrace and kgdb script debugging.
> 
> >> OK, in the meantime I did some more research and I am now quite sure the
> >> problem with the bogus pf_default_rule->states_cur counter is not a
> >> problem in pf. I am convinced it is a problem in counter(9) on i386
> >> server. The critical code is the machine instruction cmpxchg8b used in
> >> /sys/i386/include/counter.h.
> >> 
> >> From intel instruction set reference manual:
> >> Zhis instruction can be used with a LOCK prefix allow the instruction to
> >> be executed atomically.
> >> 
> >> We have two other sources in kernel using cmpxchg8b:
> >>   /sys/i386/include/atomic.h   and
> >>   /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S
> > 
> > A single CPU instruction is atomic by definition, with regards to the CPU.
> > A preemption can not happen in a middle of instruction. What the "lock"
> > prefix does is memory locking to avoid unlocked parallel access to the
> > same address by different CPUs.
> > 
> > What is special about counter(9) is that %fs:%esi always points to a
> > per-CPU address, because %fs is unique for every CPU and is constant,
> > so no other CPU may write to this address, so lock prefix isn't needed.
> > 
> > Of course a true SMP i386 isn't a well tested arch, so I won't assert
> > that counter(9) doesn't have bugs on this arch. However, I don't see
> > lock prefix necessary here.
> 
> I think the problem is the cmpxchg8b instruction used in
> counter_u64_fetch(), because this machine instruction always writes to
> memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX):
> 
> TEMP64 <- DEST
> IF (EDX:EAX = TEMP64)
>THEN
>   ZF <- 1
>   DEST <- ECX:EBX
>ELSE
>   ZF <- 0
>   EDX:EAX <- TEMP64
>   DEST <- TEMP64
> FI
> 
> If one CPU increments the counter in pf_create_state() and another does
> the fetch, then both CPU's may run the xmpxschg8b at once with the
> chance that both read the same memory value in TEMP64 and the fetching
> CPU is the second CPU that writes and so the increment is lossed. Thats
> what I see without the above patch two or three times a week.

Please try the following patch.  The idea is to make the value to compare
with unlikely to be equal to the memory content, for fetch_one().  Also
it fixes a silly bug in zero_one().

diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h
index 7fd26d2a960..aa20831ba18 100644
--- a/sys/i386/include/counter.h
+++ b/sys/i386/include/counter.h
@@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p)
uint32_t res_lo, res_high;
 
__asm __volatile(
+   "movl   (%0),%%eax\n\t"
+   "movl   4(%0),%%edx\n\t"
+   "addl   $0x1000,%%edx\n\t"  /* avoid write */
"movl   %%eax,%%ebx\n\t"
"movl   %%edx,%%ecx\n\t"
"cmpxchg8b  (%2)"
@@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p)
 {
 
__asm __volatile(
+"\n1:\n\t"
"movl   (%0),%%eax\n\t"
-   "movl   4(%0),%%edx\n"
+   "movl   4(%0),%%edx\n\t"
"xorl   %%ebx,%%ebx\n\t"
"xorl   %%ecx,%%ecx\n\t"
-"1:\n\t"
"cmpxchg8b  (%0)\n\t"
"jnz1b"
:
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To 

Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2019-01-24 Thread Andreas Longwitz
after some more long term research I have an educated guess whats going
on in this problem.

The problem only occurs on i386.

If I replace the counter_u64_fetch() call in pf_state_expires() by
the value of V_pf_status.states, then pf works without problems, the
expire time zero problem is gone:


--- pf.c.1st2018-08-14 10:17:41.0 +0200
+++ pf.c2019-01-19 17:49:18.0 +0100
@@ -1542,7 +1542,7 @@
start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
if (start) {
end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
-   states = counter_u64_fetch(state->rule.ptr->states_cur);
+   states = V_pf_status.states;
} else {
start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START];
end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END];

The use of counter_u64_fetch() looks a little bit at random for me. For
all states not associated with the pf_default_rule the value of
pf_status.states is used and for me this value is ok for all rules.

Further the counter(9) framework was created for quick and lockless
write of counters, but fetching is more expansive. So I suggest to let
pf_state_expires() work without a counter fetch.

Further I can confirm that the counter states_cur of the pf_default_rule
remains correct, when the patch given above is active. Without the patch
the counter on my main firewall machine gets slowly negative. I have
verified this with a lot of live DTrace and kgdb script debugging.

>> OK, in the meantime I did some more research and I am now quite sure the
>> problem with the bogus pf_default_rule->states_cur counter is not a
>> problem in pf. I am convinced it is a problem in counter(9) on i386
>> server. The critical code is the machine instruction cmpxchg8b used in
>> /sys/i386/include/counter.h.
>> 
>> From intel instruction set reference manual:
>> Zhis instruction can be used with a LOCK prefix allow the instruction to
>> be executed atomically.
>> 
>> We have two other sources in kernel using cmpxchg8b:
>>   /sys/i386/include/atomic.h   and
>>   /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S
> 
> A single CPU instruction is atomic by definition, with regards to the CPU.
> A preemption can not happen in a middle of instruction. What the "lock"
> prefix does is memory locking to avoid unlocked parallel access to the
> same address by different CPUs.
> 
> What is special about counter(9) is that %fs:%esi always points to a
> per-CPU address, because %fs is unique for every CPU and is constant,
> so no other CPU may write to this address, so lock prefix isn't needed.
> 
> Of course a true SMP i386 isn't a well tested arch, so I won't assert
> that counter(9) doesn't have bugs on this arch. However, I don't see
> lock prefix necessary here.

I think the problem is the cmpxchg8b instruction used in
counter_u64_fetch(), because this machine instruction always writes to
memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX):

TEMP64 <- DEST
IF (EDX:EAX = TEMP64)
   THEN
  ZF <- 1
  DEST <- ECX:EBX
   ELSE
  ZF <- 0
  EDX:EAX <- TEMP64
  DEST <- TEMP64
FI

If one CPU increments the counter in pf_create_state() and another does
the fetch, then both CPU's may run the xmpxschg8b at once with the
chance that both read the same memory value in TEMP64 and the fetching
CPU is the second CPU that writes and so the increment is lossed. Thats
what I see without the above patch two or three times a week.

Andreas


___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-11-18 Thread Andreas Longwitz
Thank you all for explanation how counter(9) works in detail.

> A single CPU instruction is atomic by definition, with regards to the CPU.
> A preemption can not happen in a middle of instruction. What the "lock"
> prefix does is memory locking to avoid unlocked parallel access to the
> same address by different CPUs.

OK, my view of "atomic" in this context was wrong.

> No, it does not look correct.  The only atomicity guarantee that is required
> from the counter.h inc and zero methods are atomicity WRT context switches.
> The instructions are always executed on the CPU which owns the PCPU element
> in the counter array, and since the update is executed as single instruction,
> it does not require more expensive cache line lock AKA LOCK prefix.  This
> is the main feature of the counters on x86.
> 
> It might read bogus value when fetching the counter but counter.h KPI only
> guarantee is that the readouts are mostly correct.  If you have systematically
> wrong value always read, there is probably something different going on.

On one of my two failing servers I have eliminated all "rdr pass" rules,
so counter(9) is not used at the moment for pf_default_rule.states_cur.
Using DTrace I can see the negative value -49 for this counter:


CPU IDFUNCTION:NAME
  3  1   :BEGIN
feature=bfebfbff, ncpus=4
pf_default_rule.states_cur=0xc82cb3c8
0xc82cb3c8: counter0=0x007bd25b
0xc82cb7c8: counter1=0xffd32262
0xc82cbbc8: counter2=0xffd87de1
0xc82cbfc8: counter3=0xffd88d31
counter =0xffcf

On my other concerned server I have introduces a panic call as soon as
the counter value returned by counter_u64_fetch() in pf_state_expires()
will become negative. So I will wait for the panic and hope for more
information from the kerneldump.

There is one unusual configuration on the two servers: they use pf and
ipfw/ipdivert at once. The reason for this is my use of natd for
incoming ftp requests to my ftp server, pf handles all the other
traffic. This configuration is a little bit tricky but works correct for
many years.
One exception was recently a problem with a buggy remote ftp client I
had to debug. During this period I had to restart/reload ipfw and natd a
couple of times. Because pf also has a reference to ipdivert, perhaps
there is a hidden interaction with the expire problem of pf.

Annotation: the buggy ftp client revealed a problem in natd (PR 230755).

Regards,
Andreas



___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-11-13 Thread Konstantin Belousov
On Tue, Nov 13, 2018 at 11:17:47PM +0100, Kristof Provost wrote:
> On 13 Nov 2018, at 22:01, Andreas Longwitz wrote:
> >>
> >> Are there any hints why the counter pf_default_rule->states_cur
> >> could get a negative value ?
> >>
> >> I’m afraid I have no idea right now.
> >>
> >
> > OK, in the meantime I did some more research and I am now quite sure 
> > the
> > problem with the bogus pf_default_rule->states_cur counter is not a
> > problem in pf. I am convinced it is a problem in counter(9) on i386
> > server. The critical code is the machine instruction cmpxchg8b used in
> > /sys/i386/include/counter.h.
> >
> I’m always happy to hear problems aren’t my fault :)
> 
> >> From intel instruction set reference manual:
> > Zhis instruction can be used with a LOCK prefix allow the instruction 
> > to
> > be executed atomically.
> >
> > We have two other sources in kernel using cmpxchg8b:
> >   /sys/i386/include/atomic.h   and
> >   /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S
> >
> > Both make use of the LOCK feature, in atomic.h a detailed explanation 
> > is
> > given. Because counter.h lacks the LOCK prefix I propose the following
> > patch to get around the leak:
> >
> > --- counter.h.orig   2015-07-03 16:45:36.0 +0200
> > +++ counter.h   2018-11-13 16:07:20.329053000 +0100
> > @@ -60,6 +60,7 @@
> > "movl   %%edx,%%ecx\n\t"
> > "addl   (%%edi),%%ebx\n\t"
> > "adcl   4(%%edi),%%ecx\n\t"
> > +   "lock   \n\t"
> > "cmpxchg8b %%fs:(%%esi)\n\t"
> > "jnz1b"
> > :
> > @@ -76,6 +77,7 @@
> > __asm __volatile(
> > "movl   %%eax,%%ebx\n\t"
> > "movl   %%edx,%%ecx\n\t"
> > +   "lock   \n\t"
> > "cmpxchg8b  (%2)"
> > : "=a" (res_lo), "=d"(res_high)
> > : "SD" (p)
> > @@ -121,6 +123,7 @@
> > "xorl   %%ebx,%%ebx\n\t"
> > "xorl   %%ecx,%%ecx\n\t"
> >  "1:\n\t"
> > +   "lock   \n\t"
> > "cmpxchg8b  (%0)\n\t"
> > "jnz1b"
> > :
> >
> That looks very plausible. I’m somewhat out of my depth here, so I’d 
> like the authors of the counter code to take a look at it.

No, it does not look correct.  The only atomicity guarantee that is required
from the counter.h inc and zero methods are atomicity WRT context switches.
The instructions are always executed on the CPU which owns the PCPU element
in the counter array, and since the update is executed as single instruction,
it does not require more expensive cache line lock AKA LOCK prefix.  This
is the main feature of the counters on x86.

It might read bogus value when fetching the counter but counter.h KPI only
guarantee is that the readouts are mostly correct.  If you have systematically
wrong value always read, there is probably something different going on.
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-11-13 Thread Gleb Smirnoff
On Tue, Nov 13, 2018 at 10:01:14PM +0100, Andreas Longwitz wrote:
A> OK, in the meantime I did some more research and I am now quite sure the
A> problem with the bogus pf_default_rule->states_cur counter is not a
A> problem in pf. I am convinced it is a problem in counter(9) on i386
A> server. The critical code is the machine instruction cmpxchg8b used in
A> /sys/i386/include/counter.h.
A> 
A> From intel instruction set reference manual:
A> Zhis instruction can be used with a LOCK prefix allow the instruction to
A> be executed atomically.
A> 
A> We have two other sources in kernel using cmpxchg8b:
A>   /sys/i386/include/atomic.h   and
A>   /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S

A single CPU instruction is atomic by definition, with regards to the CPU.
A preemption can not happen in a middle of instruction. What the "lock"
prefix does is memory locking to avoid unlocked parallel access to the
same address by different CPUs.

What is special about counter(9) is that %fs:%esi always points to a
per-CPU address, because %fs is unique for every CPU and is constant,
so no other CPU may write to this address, so lock prefix isn't needed.

Of course a true SMP i386 isn't a well tested arch, so I won't assert
that counter(9) doesn't have bugs on this arch. However, I don't see
lock prefix necessary here.

-- 
Gleb Smirnoff
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-11-13 Thread Kristof Provost

On 13 Nov 2018, at 22:01, Andreas Longwitz wrote:


Are there any hints why the counter pf_default_rule->states_cur
could get a negative value ?

I’m afraid I have no idea right now.



OK, in the meantime I did some more research and I am now quite sure 
the

problem with the bogus pf_default_rule->states_cur counter is not a
problem in pf. I am convinced it is a problem in counter(9) on i386
server. The critical code is the machine instruction cmpxchg8b used in
/sys/i386/include/counter.h.


I’m always happy to hear problems aren’t my fault :)


From intel instruction set reference manual:
Zhis instruction can be used with a LOCK prefix allow the instruction 
to

be executed atomically.

We have two other sources in kernel using cmpxchg8b:
  /sys/i386/include/atomic.h   and
  /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S

Both make use of the LOCK feature, in atomic.h a detailed explanation 
is

given. Because counter.h lacks the LOCK prefix I propose the following
patch to get around the leak:

--- counter.h.orig   2015-07-03 16:45:36.0 +0200
+++ counter.h   2018-11-13 16:07:20.329053000 +0100
@@ -60,6 +60,7 @@
"movl   %%edx,%%ecx\n\t"
"addl   (%%edi),%%ebx\n\t"
"adcl   4(%%edi),%%ecx\n\t"
+   "lock   \n\t"
"cmpxchg8b %%fs:(%%esi)\n\t"
"jnz1b"
:
@@ -76,6 +77,7 @@
__asm __volatile(
"movl   %%eax,%%ebx\n\t"
"movl   %%edx,%%ecx\n\t"
+   "lock   \n\t"
"cmpxchg8b  (%2)"
: "=a" (res_lo), "=d"(res_high)
: "SD" (p)
@@ -121,6 +123,7 @@
"xorl   %%ebx,%%ebx\n\t"
"xorl   %%ecx,%%ecx\n\t"
 "1:\n\t"
+   "lock   \n\t"
"cmpxchg8b  (%0)\n\t"
"jnz1b"
:

That looks very plausible. I’m somewhat out of my depth here, so I’d 
like the authors of the counter code to take a look at it.


Best regards,
Kristof
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-11-13 Thread Andreas Longwitz
> 
> Are there any hints why the counter pf_default_rule->states_cur
> could get a negative value ?
> 
> I’m afraid I have no idea right now.
> 

OK, in the meantime I did some more research and I am now quite sure the
problem with the bogus pf_default_rule->states_cur counter is not a
problem in pf. I am convinced it is a problem in counter(9) on i386
server. The critical code is the machine instruction cmpxchg8b used in
/sys/i386/include/counter.h.

From intel instruction set reference manual:
Zhis instruction can be used with a LOCK prefix allow the instruction to
be executed atomically.

We have two other sources in kernel using cmpxchg8b:
  /sys/i386/include/atomic.h   and
  /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S

Both make use of the LOCK feature, in atomic.h a detailed explanation is
given. Because counter.h lacks the LOCK prefix I propose the following
patch to get around the leak:

--- counter.h.orig   2015-07-03 16:45:36.0 +0200
+++ counter.h   2018-11-13 16:07:20.329053000 +0100
@@ -60,6 +60,7 @@
"movl   %%edx,%%ecx\n\t"
"addl   (%%edi),%%ebx\n\t"
"adcl   4(%%edi),%%ecx\n\t"
+   "lock   \n\t"
"cmpxchg8b %%fs:(%%esi)\n\t"
"jnz1b"
:
@@ -76,6 +77,7 @@
__asm __volatile(
"movl   %%eax,%%ebx\n\t"
"movl   %%edx,%%ecx\n\t"
+   "lock   \n\t"
"cmpxchg8b  (%2)"
: "=a" (res_lo), "=d"(res_high)
: "SD" (p)
@@ -121,6 +123,7 @@
"xorl   %%ebx,%%ebx\n\t"
"xorl   %%ecx,%%ecx\n\t"
 "1:\n\t"
+   "lock   \n\t"
"cmpxchg8b  (%0)\n\t"
"jnz1b"
:

Kindly regards,
Andreas


___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-10-27 Thread Andreas Longwitz
> In the problem I have reported for states of "rdr pass" rules I see
> start=6000, end=12000, timeout=86400 and (obviously erroneous, probably
> negative) states=0xffd0.
> 
> I have no idea how that can happen. Just to make sure I understand: you
> know that states is negative here because of a printf() or SDT addition
> in pf_expire_states(), right?

I did not change the kernel, I use DTrace on my firewall server
fwextern. In pf.conf I have changed all productive "rdr pass" rules to a
rdr rule and an extra filter rule. Now only one "rdr pass" rule is left
for test:

  rdr pass on $if_internet proto tcp from 31.17.172.227 to $ip_internet
  port 8022 -> 10.0.0.254

Now I start the following DTrace script pfcounter.d, which will be
active when a SYN on port 8022 arrives:

#!/usr/sbin/dtrace -s

fbt::pf_normalize_tcp:entry
/((*(args[2]->m_hdr.mh_data + 33)) & 0x02) == 0x02 && htons(*(short
*)(args[2]->m_hdr.mh_data + 22)) == 8022/
   /* SYN + port 8022 */
{ self->flag1 = 1; }

fbt::pf_test:return
/self->flag1/ { self->flag1 = 0; }

fbt::pfioctl:entry
/args[1] == 3221767193 && ((struct pfioc_states *)args[2])->ps_len != 0/
   /* DIOCGETSTATES  &&  len != 0 */
{ self->flag2 = 1; }

fbt::counter_u64_fetch:entry
/self->flag2/ { }
fbt::counter_u64_fetch:return
/self->flag2/ { printf("returncode (states_cur)=%d / 0x%x",
args[1], args[1]); }

fbt::pfioctl:return
/self->flag2/
{ self->flag2 = 0; }


Now I run on my remote test client (IP 31.17.172.227) the command

   ssh -p 8022 fwextern sleep 20

This creates on fwextern a state for the "rdr pass" rule with expire
time zero. I must be quick to run "pfctl -vss" on fwextern to see this
state and the output of the DTrace script shows me the "negative" value
of the counter:

=== root@fwextern (pts/0) -> ./pfcounter.d
dtrace: script './pfcounter.d' matched 6 probes
CPU IDFUNCTION:NAME
  3  17624  counter_u64_fetch:entry
  3  17625 counter_u64_fetch:return returncode
(states_cur)=4294967248 / 0xffd0

If I run on the test client the ssh command twice, then the counter is
one less negative than before:

=== root@fwextern (pts/0) -> ./pfcounter.d
dtrace: script './pfcounter.d' matched 6 probes
CPU IDFUNCTION:NAME
  3  17624  counter_u64_fetch:entry
  3  17625 counter_u64_fetch:return returncode
(states_cur)=4294967249 / 0xffd1
  3  17624  counter_u64_fetch:entry
  3  17625 counter_u64_fetch:return returncode
(states_cur)=4294967249 / 0xffd1

Because of "sleep 20" the ssh command does not return and must be
killed. I have observed the problem on two of my firewall servers, the
pf rules never were reloaded since boot. I think there must be an
unknown event in the past, that triggered the negative counter value.

I will try to add a statement to the kernel that recognizes the problem
and go back to the "rdr pass" rules, so next time the problem occurres
we have more information than now.


Kindly regards,
Andreas

___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-10-27 Thread Kristof Provost

On 27 Oct 2018, at 5:22, Andreas Longwitz wrote:

Thanks very much for answer especially for the hint to openbsd.


I wonder if there’s an integer overflow in the of_state_expires()
calculation.
The OpenBSD people have a cast to u_int64_t in their version:

|timeout = (u_int64_t)timeout * (end - states) / (end - start);
|

Perhaps this would fix your problem? (Untested, not even compiled)

|if (end && states > start && start < end) {
if (states < end) {
timeout = (uint64_t)timeout * (end - states) / 
(end - start);

return (state->expire + timeout;
}
else
return (time_uptime);
}
return (state->expire + timeout);


I can confirm the patch of the openbsd people adding the uint64_t cast
makes sense. If
timeout * (end - states)
becomes bigger than UINT32_MAX (I am on i386) the cast prevents the
overflow of this product and the result of the adaptive calculation 
will

always be correct.

Example: start=6000, end=12000, timeout=86400 * 5 (5 days), states=100
 result 140972, result with cast patch 856800.

In the problem I have reported for states of "rdr pass" rules I see
start=6000, end=12000, timeout=86400 and (obviously erroneous, 
probably

negative) states=0xffd0.

I have no idea how that can happen. Just to make sure I understand: you 
know that states is negative here because of a printf() or SDT addition 
in pf_expire_states(), right?



Further the counter variable for states_cur of pf_default_rule is
used für all "rdr/nat/binat pass" rules together. This was a little 
bit

suprising for me, but I think this is intended behaviour. Correct ?


Yes.


Are there any hints why the counter pf_default_rule->states_cur
could get a negative value ?


I’m afraid I have no idea right now.

Best regards,
Kristof
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-10-27 Thread Andreas Longwitz
Thanks very much for answer especially for the hint to openbsd.

> I wonder if there’s an integer overflow in the of_state_expires()
> calculation.
> The OpenBSD people have a cast to u_int64_t in their version:
> 
> |timeout = (u_int64_t)timeout * (end - states) / (end - start);
> |
> 
> Perhaps this would fix your problem? (Untested, not even compiled)
> 
> |if (end && states > start && start < end) {
> if (states < end) {
> timeout = (uint64_t)timeout * (end - states) / (end - 
> start);
> return (state->expire + timeout;
> }
> else
> return (time_uptime);
> }
> return (state->expire + timeout);

I can confirm the patch of the openbsd people adding the uint64_t cast
makes sense. If
timeout * (end - states)
becomes bigger than UINT32_MAX (I am on i386) the cast prevents the
overflow of this product and the result of the adaptive calculation will
always be correct.

Example: start=6000, end=12000, timeout=86400 * 5 (5 days), states=100
 result 140972, result with cast patch 856800.

In the problem I have reported for states of "rdr pass" rules I see
start=6000, end=12000, timeout=86400 and (obviously erroneous, probably
negative) states=0xffd0. Therefore the condition "states < end" is
false and instead of doing the adaptive calculation the function
pf_states_expires() returns time_update.

In the code snippet

  start = state->rule.ptr->timeout[PFTM_ADAPTIVE_START];
  if (start) {
 end = state->rule.ptr->timeout[PFTM_ADAPTIVE_END];
 states = counter_u64_fetch(state->rule.ptr->states_cur);
  } else {
 start = V_pf_default_rule.timeout[PFTM_ADAPTIVE_START];
 end = V_pf_default_rule.timeout[PFTM_ADAPTIVE_END];
 states = V_pf_status.states;
  }

I see start=6000, so the variable states with value 0xffd0 is hold
in a counter variable. Based on these facts there are two questions
(please notice I have not any adaptive statements in my pf.conf):

1. Why I see start=6000 for states of "rdr pass" rules ? For all states
of filter rules I see always start=0 and therefore a counter variable is
never used.

2, The counter variable states_cur is changed exclusiv by the macros
STATE_INC/DEC_COUNTERS, why one of them can be negative ?

The answer of question 1 I can give myself: All states of filter rules
include a pointer to the underlying rule which includes counter
variables too. A state of a "rdr pass" rule does not have such an
underlying rule, so the global pf_default_rule is used. That means in
the first line of the code snippet above we have

state->rule.ptr == pf_default_rule

and the value of timeout[PFTM_ADAPTIVE_END] indeed is initialized with
6000. Further the counter variable for states_cur of pf_default_rule is
used für all "rdr/nat/binat pass" rules together. This was a little bit
suprising for me, but I think this is intended behaviour. Correct ?

I still don't have an answer for question 2. My problem is best
described in the log of commit 263029:

  Once pf became not covered by a single mutex, many counters in it
  became race prone. Some just gather statistics, but some are later
  used in different calculations.

  A real problem was the race provoked underflow of the states_cur
  counter on a rule. Once it goes below zero, it wraps to UINT32_MAX.
  Later this value is used in pf_state_expires() and any state created
  by this rule is immediately expired.

  Thus, make fields states_cur, states_tot and src_nodes of struct
  pf_rule be counter(9)s.

I run FreeBSD 10 r338093 and of course have this and the following
commits included.

Are there any hints why the counter pf_default_rule->states_cur
could get a negative value ?

 ---
Andreas Longwitz

___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-10-18 Thread Kristof Provost

On 15 Oct 2018, at 15:26, Andreas Longwitz wrote:

On two of my FreeBSD 10 (r338093) firewall servers some incoming ssh
connections stopped to work because pf started to create states with
expire time zero (instead of 86400 sec) for translation statements 
like


rdr pass on em0 proto tcp from any to myip port 8022 --> 10.0.0.254

Therefore a command given on a remote server like

   ssh -p 8022 myip sleep 15

did not return, because the created state for the connection was 
purged

by pf (interval 10 sec) before 15 seconds. If I replace the rdr pass
rule with a rdr rule and a separate filter pass rule then the created
state always has expire time 24h and everything is ok.

I have tried to analyze the bug in the rdr pass rule. Immediately 
after
starting the above ssh command on the remote sever I see with pfctl 
-vss

the sreated state on my firewall machine:

all tcp 10.0.0.254:8022 (myip:8022) <- remoteip:59233
ESTABLISHED:ESTABLISHED
   [1443872812 + 66608] wscale 6  [1365307391 + 66560] wscale 3
   age 00:00:00, expires in 00:00:00, 13:12 pkts, 2955:3306 bytes

and a DTrace script running at the same time gives

  3  19323pfsync_state_export:entry
creation=4491391. expire=4491391, state_timeout=2
  3  16459   pf_state_expires:entry 
state_timeout=86400,

start_timeout=6000, end_timeout=12000 states=882
  3  17624  counter_u64_fetch:entry
  3  17625 counter_u64_fetch:return
returncode (states_cur)=4294967248 = 0xffd0
  3  16460  pf_state_expires:return
returncode=4491391
  3  19324   pfsync_state_export:return
creation=0. expire=0, syncstate_timeout=2
  0  12730   pfioctl:return
returncode=0, time_uptime=4491391

So pf_state_expires() returns the value time_update and therefore
pfsync_state_export() gives expire zero. Reason for this is the very 
big
(means negative) value returned by the function counter_u64_fetch() 
for

states_cur. This variable is of type uint64_t and only incremented and
decremented by the macros STATE_INC_COUNTERS and STATE_DEC_COUNTERS.

I wonder if there’s an integer overflow in the of_state_expires() 
calculation.

The OpenBSD people have a cast to u_int64_t in their version:

timeout = (u_int64_t)timeout * (end - states) / (end - start);

Perhaps this would fix your problem? (Untested, not even compiled)

if (end && states > start && start < end) {
if (states < end) {
timeout = (uint64_t)timeout * 
(end - states) / (end - start);
return (state->expire + timeout;
}
else
return (time_uptime);
}
return (state->expire + timeout);
}

Best regards,
Kristof
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections

2018-10-15 Thread Andreas Longwitz
On two of my FreeBSD 10 (r338093) firewall servers some incoming ssh
connections stopped to work because pf started to create states with
expire time zero (instead of 86400 sec) for translation statements like

rdr pass on em0 proto tcp from any to myip port 8022 --> 10.0.0.254

Therefore a command given on a remote server like

   ssh -p 8022 myip sleep 15

did not return, because the created state for the connection was purged
by pf (interval 10 sec) before 15 seconds. If I replace the rdr pass
rule with a rdr rule and a separate filter pass rule then the created
state always has expire time 24h and everything is ok.

I have tried to analyze the bug in the rdr pass rule. Immediately after
starting the above ssh command on the remote sever I see with pfctl -vss
the sreated state on my firewall machine:

all tcp 10.0.0.254:8022 (myip:8022) <- remoteip:59233
ESTABLISHED:ESTABLISHED
   [1443872812 + 66608] wscale 6  [1365307391 + 66560] wscale 3
   age 00:00:00, expires in 00:00:00, 13:12 pkts, 2955:3306 bytes

and a DTrace script running at the same time gives

  3  19323pfsync_state_export:entry
creation=4491391. expire=4491391, state_timeout=2
  3  16459   pf_state_expires:entry state_timeout=86400,
start_timeout=6000, end_timeout=12000 states=882
  3  17624  counter_u64_fetch:entry
  3  17625 counter_u64_fetch:return
returncode (states_cur)=4294967248 = 0xffd0
  3  16460  pf_state_expires:return
returncode=4491391
  3  19324   pfsync_state_export:return
creation=0. expire=0, syncstate_timeout=2
  0  12730   pfioctl:return
returncode=0, time_uptime=4491391

So pf_state_expires() returns the value time_update and therefore
pfsync_state_export() gives expire zero. Reason for this is the very big
(means negative) value returned by the function counter_u64_fetch() for
states_cur. This variable is of type uint64_t and only incremented and
decremented by the macros STATE_INC_COUNTERS and STATE_DEC_COUNTERS.

I see two possibilities: Maybe counter(9) is broken for my hardware
(i386, cpu_feature=0xbfebfbff) or STATE_DEC_COUNTERS sometimes is called
 without STATE_INC_COUNTERS called before. In the above example the
value of states_cur is 1 on a fresh booted system, so we may have a
difference of 49 in calls of the COUNTER macros. I see the same number
49 in the output of pfctl -si in the line for short packets. This may be
a hint or its random.


-- 
Andreas Longwitz

___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"