On 2016-03-29 10:58, Christian Ruppert wrote:
Hi Willy,

On 2016-03-25 18:17, Willy Tarreau wrote:
On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote:
I think it's even different (but could be wrong) since Christian spoke
about counters suddenly doubling. The issue you faced Sylvain which I
still have no idea how to fix unfortunately is that the peers applet
is not always woken up when a connection establishes on the other side
and it may simply miss an event, resulting in everything remaining
stable and appear frozen until the connection closes. Here it seems
data are exchanged but incorrect. This one could be easier to reproduce
however, we'll check.

OK I found it. Indeed it was easy to reproduce. The frequency counters
are sent as "now - freq.date", which is a positive age compared to the
current date. But on receipt, this age was *added* to the current date
instead of subtracted. So since the date was always in the future, they
were always expired if the activity changed side in less than the
counter's measuring period (eg: 10s).

I'm commiting this simple fix that you can apply to your tree for now.

Cheers,
Willy

diff --git a/src/peers.c b/src/peers.c
index c29ea73..9918dac 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -1153,7 +1153,7 @@ switchstate:
                                                                        case 
STD_T_FRQP: {
                                                                                
struct freq_ctr_period data;

- data.curr_tick = tick_add(now_ms, intdecode(&msg_cur, msg_end)); + data.curr_tick = tick_add(now_ms, -intdecode(&msg_cur, msg_end));
                                                                                
if (!msg_cur) {
                                                                                
        /* malformed message */
                                                                                   
     appctx->st0 = PEER_SESS_ST_ERRPROTO;

Thanks a lot for the fast investigation! The proposed patch seems to
do the trick :)

Hrm, or not. At least not completely.
There's still something wrong it seems:
20160329 15:07:03: 0x3bca858: key=xx.xx.xx.xx use=0 exp=28799601 gpc0=0 conn_cnt=682 conn_rate(10000)=1 conn_cur=3 sess_cnt=1 sess_rate(10000)=-1032058827 http_req_cnt=0 http_req_rate(10000)=2272 http_err_cnt=3 http_err_rate(10000)=1143800 bytes_in_cnt=0 bytes_out_cnt=247977 Note the sess_rate is a negative int. Some http_err_rate seems to be affected as well. Even the http_req_rate seems to be still wrong, in some cases. 20160329 15:11:38: 0x3e67318: key=xx.xx.xx.xx use=0 exp=28605259 gpc0=0 conn_cnt=86 conn_rate(10000)=0 conn_cur=7 sess_cnt=0 sess_rate(10000)=0 http_req_cnt=0 http_req_rate(10000)=349038424 http_err_cnt=6 http_err_rate(10000)=0 bytes_in_cnt=0 bytes_out_cnt=3261818950 We're using httpclose so in this case it *actually* should match the conn_cnt so 86.

--
Regards,
Christian Ruppert

Reply via email to