Re: [PATCH] allow higer averages then 16448ms

2016-11-25 Thread Willy Tarreau
Hi again Reinhard,

On Fri, Nov 25, 2016 at 09:17:41AM +0100, Reinhard Vicinus wrote:
> Hi Willy,
> 
> if the cost are to high, then I have no problem keeping the known
> behavior. The only thing I would suggest is to document it, because it
> caused me some headache to figure out why the values were always to low
> and I couldn't find any information, that this behavior is a know problem.

I finally found a much more elegant solution by improving the formula to
save one multiply. Not only does this avoid the overflow without changing
the integer size, it's also faster :-)

Now the limit is at 8.4M milliseconds of average time, or around 2h18m,
this should be plenty for most situations! I'm attaching the patch I've
just merged for this.

Best regards,
Willy
>From 3758581e197dd7b390fb3da94c08f34e0d319c07 Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Fri, 25 Nov 2016 11:55:10 +0100
Subject: BUG/MINOR: freq-ctr: make swrate_add() support larger values
X-Bogosity: Ham, tests=bogofilter, spamicity=0.00, version=1.2.4

Reinhard Vicinus reported that the reported average response times cannot
be larger than 16s due to the double multiply being performed by
swrate_add() which causes an overflow very quickly. Indeed, with N=512,
the highest average value is 16448.

One solution proposed by Reinhard is to turn to long long, but this
involves 64x64 multiplies and 64->32 divides, which are extremely
expensive on 32-bit platforms.

There is in fact another way to avoid the overflow without using larger
integers, it consists in avoiding the multiply using the fact that
x*(n-1)/N = x-(x/N).

Now it becomes possible to store average values as large as 8.4 millions,
which is around 2h18mn.

Interestingly, this improvement also makes the code cheaper to execute
both on 32 and on 64 bit platforms :

Before :

 :
   0:   8b 54 24 04 mov0x4(%esp),%edx
   4:   8b 0a   mov(%edx),%ecx
   6:   89 c8   mov%ecx,%eax
   8:   c1 e0 09shl$0x9,%eax
   b:   29 c8   sub%ecx,%eax
   d:   8b 4c 24 0c mov0xc(%esp),%ecx
  11:   c1 e8 09shr$0x9,%eax
  14:   01 c8   add%ecx,%eax
  16:   89 02   mov%eax,(%edx)

After :

0020 :
  20:   8b 4c 24 04 mov0x4(%esp),%ecx
  24:   8b 44 24 0c mov0xc(%esp),%eax
  28:   8b 11   mov(%ecx),%edx
  2a:   01 d0   add%edx,%eax
  2c:   81 c2 ff 01 00 00   add$0x1ff,%edx
  32:   c1 ea 09shr$0x9,%edx
  35:   29 d0   sub%edx,%eax
  37:   89 01   mov%eax,(%ecx)

This fix may be backported to 1.6.
---
 include/proto/freq_ctr.h | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/proto/freq_ctr.h b/include/proto/freq_ctr.h
index 65388b1..70b295e 100644
--- a/include/proto/freq_ctr.h
+++ b/include/proto/freq_ctr.h
@@ -182,7 +182,19 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period 
*ctr, unsigned int pe
  *
  * So basically by summing values and applying the last result an (N-1)/N 
factor
  * we just get N times the values over the long term, so we can recover the
- * constant value V by dividing by N.
+ * constant value V by dividing by N. In order to limit the impact of integer
+ * overflows, we'll use this equivalence which saves us one multiply :
+ *
+ *   N - 1   1 x0
+ *x1 = x0 * ---   =  x0 * ( 1 - --- )  = x0 - 
+ * N N  N
+ *
+ * And given that x0 is discrete here we'll have to saturate the values before
+ * performing the divide, so the value insertion will become :
+ *
+ *   x0 + N - 1
+ *x1 = x0 - 
+ *N
  *
  * A value added at the entry of the sliding window of N values will thus be
  * reduced to 1/e or 36.7% after N terms have been added. After a second batch,
@@ -220,7 +232,7 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period 
*ctr, unsigned int pe
  */
 static inline unsigned int swrate_add(unsigned int *sum, unsigned int n, 
unsigned int v)
 {
-   return *sum = *sum * (n - 1) / n + v;
+   return *sum = *sum - (*sum + n - 1) / n + v;
 }
 
 /* Returns the average sample value for the sum  over a sliding window of
-- 
1.7.12.1



Re: [PATCH] allow higer averages then 16448ms

2016-11-25 Thread Reinhard Vicinus
Hi Willy,

if the cost are to high, then I have no problem keeping the known
behavior. The only thing I would suggest is to document it, because it
caused me some headache to figure out why the values were always to low
and I couldn't find any information, that this behavior is a know problem.

Best regards
Reinhard

On 11/25/2016 08:04 AM, Willy Tarreau wrote:
> Hi Reinhard,
>
> On Thu, Nov 24, 2016 at 10:04:31PM +0100, Reinhard Vicinus wrote:
>> Hi,
>>
>> we use haproxy (1.6.9) to balance very long running POST requests
>> (around 50 seconds) to backend servers. It generally works like a charm,
>> but the average queue time and average total session time statistic
>> values are totally screwed up.
>>
>> The problem is that the average is calculated like this for every request:
>>
>> sum = sum * 511 / 512 + value
>>
>> for a fixed value and enough iterations:
>>
>> sum = value * 511
>>
>> the problem is that at every iteration sum will first be multiplied by
>> 511 and therefore the maximum value during the calculation is:
>>
>> value * 511 * 511
>>
>> A unsigned int can store a maximum value of 4294967296. Divided by
>> 511*511 results in 16448. That means any backend with average times
>> above 16448ms will be affected by integer overflow and have wrong values.
> Yes we do know this limitation.
>
>> The attached patch tries to solve this by storing and calculating sum as
>> unsigned long long instead of a unsigned int. I don't know if the
>> attached patch will work in every case, but during my limited testing it
>> worked.
> It will definitely work, but I didn't want to do it because of the
> very expensive cost of the 64x64 multiply and divide on 32 bit
> platforms which causes a measurable performance impact. However I'll
> do some tests because is often OK, and doing 32x32/32 with a 64-bit
> intermediary result is OK as well. If I can do it this way, I'll do
> it. Otherwise I'd prefer that we just switch do long so that 64-bit
> platforms can benefit from the large timers and 32-bit ones are
> limited to lower values.
>
> Thanks,
> Willy
>


-- 
Reinhard Vicinus
Metaways Infosystems GmbH
Pickhuben 2, D-20457 Hamburg

E-Mail: r.vici...@metaways.de
Web:http://www.metaways.de
Tel:+49 (0)40 317031-524
Fax:+49 (0)40 317031-10

Metaways Infosystems GmbH - Sitz: D-22967 Tremsbüttel
Handelsregister: Amtsgericht Lübeck HRB 4508 AH
Geschäftsführung: Hermann Thaele, Lüder-H. Thaele





Re: [PATCH] allow higer averages then 16448ms

2016-11-24 Thread Willy Tarreau
Hi Reinhard,

On Thu, Nov 24, 2016 at 10:04:31PM +0100, Reinhard Vicinus wrote:
> Hi,
> 
> we use haproxy (1.6.9) to balance very long running POST requests
> (around 50 seconds) to backend servers. It generally works like a charm,
> but the average queue time and average total session time statistic
> values are totally screwed up.
> 
> The problem is that the average is calculated like this for every request:
> 
> sum = sum * 511 / 512 + value
> 
> for a fixed value and enough iterations:
> 
> sum = value * 511
> 
> the problem is that at every iteration sum will first be multiplied by
> 511 and therefore the maximum value during the calculation is:
> 
> value * 511 * 511
>
> A unsigned int can store a maximum value of 4294967296. Divided by
> 511*511 results in 16448. That means any backend with average times
> above 16448ms will be affected by integer overflow and have wrong values.

Yes we do know this limitation.

> The attached patch tries to solve this by storing and calculating sum as
> unsigned long long instead of a unsigned int. I don't know if the
> attached patch will work in every case, but during my limited testing it
> worked.

It will definitely work, but I didn't want to do it because of the
very expensive cost of the 64x64 multiply and divide on 32 bit
platforms which causes a measurable performance impact. However I'll
do some tests because is often OK, and doing 32x32/32 with a 64-bit
intermediary result is OK as well. If I can do it this way, I'll do
it. Otherwise I'd prefer that we just switch do long so that 64-bit
platforms can benefit from the large timers and 32-bit ones are
limited to lower values.

Thanks,
Willy



[PATCH] allow higer averages then 16448ms

2016-11-24 Thread Reinhard Vicinus
Hi,

we use haproxy (1.6.9) to balance very long running POST requests
(around 50 seconds) to backend servers. It generally works like a charm,
but the average queue time and average total session time statistic
values are totally screwed up.

The problem is that the average is calculated like this for every request:

sum = sum * 511 / 512 + value

for a fixed value and enough iterations:

sum = value * 511

the problem is that at every iteration sum will first be multiplied by
511 and therefore the maximum value during the calculation is:

value * 511 * 511

A unsigned int can store a maximum value of 4294967296. Divided by
511*511 results in 16448. That means any backend with average times
above 16448ms will be affected by integer overflow and have wrong values.

The attached patch tries to solve this by storing and calculating sum as
unsigned long long instead of a unsigned int. I don't know if the
attached patch will work in every case, but during my limited testing it
worked. I'm also not sure if the implicit conversion from unsigned long
long to unsigned int in swrate_avg is a good idea. It should never be a
problem, because value is an unsigned int and the average can never be
higher then the maximum value. But I can understand if there are coding
conventions against implicit conversions. Generally I don't have a lot
of c experience, so feel free to improve the patch or solve the problem
another way.

Kind regards
Reinhard
From b298b230b541bb33d1cbea92f5472c926e10b7f0 Mon Sep 17 00:00:00 2001
From: "Vicinus, Reinhard" <r.vici...@metaways.de>
Date: Thu, 24 Nov 2016 21:41:17 +0100
Subject: [PATCH] allow higer averages then 16448ms

---
 include/proto/freq_ctr.h | 4 ++--
 include/types/counters.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/proto/freq_ctr.h b/include/proto/freq_ctr.h
index 65388b1..099a58d 100644
--- a/include/proto/freq_ctr.h
+++ b/include/proto/freq_ctr.h
@@ -218,7 +218,7 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period *ctr, unsigned int pe
 /* Adds sample value  to sliding window sum  configured for  samples.
  * The sample is returned. Better if  is a power of two.
  */
-static inline unsigned int swrate_add(unsigned int *sum, unsigned int n, unsigned int v)
+static inline unsigned long long swrate_add(unsigned long long *sum, unsigned int n, unsigned int v)
 {
 	return *sum = *sum * (n - 1) / n + v;
 }
@@ -227,7 +227,7 @@ static inline unsigned int swrate_add(unsigned int *sum, unsigned int n, unsigne
  *  samples. Better if  is a power of two. It must be the same  as the
  * one used above in all additions.
  */
-static inline unsigned int swrate_avg(unsigned int sum, unsigned int n)
+static inline unsigned int swrate_avg(unsigned long long sum, unsigned int n)
 {
 	return (sum + n - 1) / n;
 }
diff --git a/include/types/counters.h b/include/types/counters.h
index 3e62763..08bcee7 100644
--- a/include/types/counters.h
+++ b/include/types/counters.h
@@ -56,7 +56,7 @@ struct pxcounters {
 	long long redispatches; /* retried and redispatched connections (BE only) */
 	long long intercepted_req;  /* number of monitoring or stats requests intercepted by the frontend */
 
-	unsigned int q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */
+	unsigned long long q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */
 
 	union {
 		struct {
@@ -100,7 +100,7 @@ struct srvcounters {
 	long long retries, redispatches;	/* retried and redispatched connections */
 	long long failed_secu;			/* blocked responses because of security concerns */
 
-	unsigned int q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */
+	unsigned long long q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */
 
 	union {
 		struct {
-- 
2.10.2