[BUG] haproxy 1.8-last/master-worker/peers

2017-11-17 Thread Emmanuel Hocdet

Hi,

In master-worker mode with peers, old worker never died after a reload (kill 
-USR2).

Tested without traffic, with/without threads.
Without peers, no problems.

++
Manu




Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-18 Thread Willy Tarreau
Hi Manu,

On Fri, Nov 17, 2017 at 05:14:11PM +0100, Emmanuel Hocdet wrote:
> In master-worker mode with peers, old worker never died after a reload (kill 
> -USR2).
> 
> Tested without traffic, with/without threads.
> Without peers, no problems.

Thanks for the report. I'm currently working on an issue causing some
streams never to end, and it *could* possibly also affect peers. We'll
have to try again once I manage to sort this out.

Cheers,
Willy



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-27 Thread Emmanuel Hocdet

Hi Willy,

> Le 18 nov. 2017 à 12:28, Willy Tarreau  a écrit :
> 
> Hi Manu,
> 
> On Fri, Nov 17, 2017 at 05:14:11PM +0100, Emmanuel Hocdet wrote:
>> In master-worker mode with peers, old worker never died after a reload (kill 
>> -USR2).
>> 
>> Tested without traffic, with/without threads.
>> Without peers, no problems.
> 
> Thanks for the report. I'm currently working on an issue causing some
> streams never to end, and it *could* possibly also affect peers. We'll
> have to try again once I manage to sort this out.

same issue with 1.8.0.

++
Manu




Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-27 Thread Willy Tarreau
Hi Manu,

On Mon, Nov 27, 2017 at 06:21:50PM +0100, Emmanuel Hocdet wrote:
> Hi Willy,
> 
> > Le 18 nov. 2017 à 12:28, Willy Tarreau  a écrit :
> > 
> > Hi Manu,
> > 
> > On Fri, Nov 17, 2017 at 05:14:11PM +0100, Emmanuel Hocdet wrote:
> >> In master-worker mode with peers, old worker never died after a reload 
> >> (kill -USR2).
> >> 
> >> Tested without traffic, with/without threads.
> >> Without peers, no problems.
> > 
> > Thanks for the report. I'm currently working on an issue causing some
> > streams never to end, and it *could* possibly also affect peers. We'll
> > have to try again once I manage to sort this out.
> 
> same issue with 1.8.0.

Then I'll need a reproducer because I couldn't reproduce it.

Thanks,
Willy



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread Emmanuel Hocdet

Hi Willy,

> Le 28 nov. 2017 à 07:33, Willy Tarreau  a écrit :
> 
> Hi Manu,
> 
> On Mon, Nov 27, 2017 at 06:21:50PM +0100, Emmanuel Hocdet wrote:
>> Hi Willy,
>> 
>>> Le 18 nov. 2017 à 12:28, Willy Tarreau  a écrit :
>>> 
>>> Hi Manu,
>>> 
>>> On Fri, Nov 17, 2017 at 05:14:11PM +0100, Emmanuel Hocdet wrote:
 In master-worker mode with peers, old worker never died after a reload 
 (kill -USR2).
 
 Tested without traffic, with/without threads.
 Without peers, no problems.
>>> 
>>> Thanks for the report. I'm currently working on an issue causing some
>>> streams never to end, and it *could* possibly also affect peers. We'll
>>> have to try again once I manage to sort this out.
>> 
>> same issue with 1.8.0.
> 
> Then I'll need a reproducer because I couldn't reproduce it.
> 


ok, i should have something strange because it’s easy to reproduce in my 
environnement.

When i look lsof i see on master:
haproxy 21355 root4u  IPv4 39007164  0t0  TCP 10.101.20.4:943 
(LISTEN)
it’s link to self ip configuration "peer L6_2 10.101.20.4:943"

Master listen on peer? really?
on each reload (kill -USR2)  on more LISTEN appears

++
Manu





Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread Frederic Lecaille

Hi Manu,

On 11/28/2017 12:22 PM, Emmanuel Hocdet wrote:


Hi Willy,

Le 28 nov. 2017 à 07:33, Willy Tarreau mailto:w...@1wt.eu>> a 
écrit :


Hi Manu,

On Mon, Nov 27, 2017 at 06:21:50PM +0100, Emmanuel Hocdet wrote:

Hi Willy,

Le 18 nov. 2017 à 12:28, Willy Tarreau mailto:w...@1wt.eu>> 
a écrit :


Hi Manu,

On Fri, Nov 17, 2017 at 05:14:11PM +0100, Emmanuel Hocdet wrote:
In master-worker mode with peers, old worker never died after a 
reload (kill -USR2).


Tested without traffic, with/without threads.
Without peers, no problems.


Thanks for the report. I'm currently working on an issue causing some
streams never to end, and it *could* possibly also affect peers. We'll
have to try again once I manage to sort this out.


same issue with 1.8.0.


Then I'll need a reproducer because I couldn't reproduce it.



ok, i should have something strange because it’s easy to reproduce in my 
environnement.


When i look lsof i see on master:
haproxy 21355 root    4u  IPv4 39007164      0t0      TCP 
10.101.20.4:943 (LISTEN)

it’s link to self ip configuration "peer L6_2 10.101.20.4:943"

Master listen on peer? really?
on each reload (kill -USR2)  on more LISTEN appears


In fact without reloading anything there are race condition issues in 
peers code.


I was working on another issue when I noticed this new one:

With only two configured peers, I see that they try to reconnect to each 
others every 10s in place of 5s as before.


This is as if a peer was blocking the other one during 5s and preventing 
it to reconnect... or something similar.


Then, after a few cycles, they try to connect to each other exactly at 
the same time. Only a "hello" message, is sent from both side without 
receiving any response.









Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread Frederic Lecaille

On 11/28/2017 01:29 PM, Frederic Lecaille wrote:

Hi Manu,

On 11/28/2017 12:22 PM, Emmanuel Hocdet wrote:


Hi Willy,

Le 28 nov. 2017 à 07:33, Willy Tarreau mailto:w...@1wt.eu>> a 
écrit :


Hi Manu,

On Mon, Nov 27, 2017 at 06:21:50PM +0100, Emmanuel Hocdet wrote:

Hi Willy,

Le 18 nov. 2017 à 12:28, Willy Tarreau mailto:w...@1wt.eu>> 
a écrit :


Hi Manu,

On Fri, Nov 17, 2017 at 05:14:11PM +0100, Emmanuel Hocdet wrote:
In master-worker mode with peers, old worker never died after a 
reload (kill -USR2).


Tested without traffic, with/without threads.
Without peers, no problems.


Thanks for the report. I'm currently working on an issue causing some
streams never to end, and it *could* possibly also affect peers. We'll
have to try again once I manage to sort this out.


same issue with 1.8.0.


Then I'll need a reproducer because I couldn't reproduce it.



ok, i should have something strange because it’s easy to reproduce in 
my environnement.


When i look lsof i see on master:
haproxy 21355 root    4u  IPv4 39007164      0t0      TCP 
10.101.20.4:943 (LISTEN)

it’s link to self ip configuration "peer L6_2 10.101.20.4:943"

Master listen on peer? really?
on each reload (kill -USR2)  on more LISTEN appears


In fact without reloading anything there are race condition issues in 
peers code.


I was working on another issue when I noticed this new one:

With only two configured peers, I see that they try to reconnect to each 
others every 10s in place of 5s as before.


This is as if a peer was blocking the other one during 5s and preventing 
it to reconnect... or something similar.


Then, after a few cycles, they try to connect to each other exactly at 
the same time. Only a "hello" message, is sent from both side without 
receiving any response.


There is a "/* fall through */" between PEER_SESS_ST_CONNECT and 
PEER_SESS_ST_GETSTATUS cases in peer I/O handler code, so with a lock 
which may be taken two times ;).


This would at least explain why two peers which have sent their "hello" 
messages at the time (PEER_SESS_ST_CONNECT state) are both deadlocked at 
the same time ;).











Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread Willy Tarreau
On Tue, Nov 28, 2017 at 01:43:52PM +0100, Frederic Lecaille wrote:
> On 11/28/2017 01:29 PM, Frederic Lecaille wrote:
> There is a "/* fall through */" between PEER_SESS_ST_CONNECT and
> PEER_SESS_ST_GETSTATUS cases in peer I/O handler code, so with a lock which
> may be taken two times ;).

>From what I'm seeing it cannot take it twice because it only grabs the
lock when curpeer switches from null to non-null.

Willy



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread Frederic Lecaille

On 11/28/2017 02:13 PM, Willy Tarreau wrote:

On Tue, Nov 28, 2017 at 01:43:52PM +0100, Frederic Lecaille wrote:

On 11/28/2017 01:29 PM, Frederic Lecaille wrote:
There is a "/* fall through */" between PEER_SESS_ST_CONNECT and
PEER_SESS_ST_GETSTATUS cases in peer I/O handler code, so with a lock which
may be taken two times ;).


 From what I'm seeing it cannot take it twice because it only grabs the
lock when curpeer switches from null to non-null.


Yes. Wrong remark on my side.

What if peer state switched from PEER_SESS_ST_CONNECT to 
PEER_SESS_ST_GETPEER?


This may happen if the peer has just sent a hello message and just after 
received a hello message.





Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread Willy Tarreau
On Tue, Nov 28, 2017 at 02:17:19PM +0100, Frederic Lecaille wrote:
> On 11/28/2017 02:13 PM, Willy Tarreau wrote:
> > On Tue, Nov 28, 2017 at 01:43:52PM +0100, Frederic Lecaille wrote:
> > > On 11/28/2017 01:29 PM, Frederic Lecaille wrote:
> > > There is a "/* fall through */" between PEER_SESS_ST_CONNECT and
> > > PEER_SESS_ST_GETSTATUS cases in peer I/O handler code, so with a lock 
> > > which
> > > may be taken two times ;).
> > 
> >  From what I'm seeing it cannot take it twice because it only grabs the
> > lock when curpeer switches from null to non-null.
> 
> Yes. Wrong remark on my side.
> 
> What if peer state switched from PEER_SESS_ST_CONNECT to
> PEER_SESS_ST_GETPEER?
> 
> This may happen if the peer has just sent a hello message and just after
> received a hello message.

>From what I'm reading (but I could be wrong), this transition does
not exist, as the only way to reach GETPEER is via the following
sequence :

   (new) -> ACCEPT -> GETVERSION -> GETHOST -> GETPEER

Anyway, a double lock is easy to spot, as the offending process would
spin at 100% CPU. That's why I think we are facing a different issue
here.

Willy



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread William Lallemand
On Tue, Nov 28, 2017 at 12:22:04PM +0100, Emmanuel Hocdet wrote:
> ok, i should have something strange because it’s easy to reproduce in my 
> environnement.
> 
> When i look lsof i see on master:
> haproxy 21355 root4u  IPv4 39007164  0t0  TCP 10.101.20.4:943 
> (LISTEN)
> it’s link to self ip configuration "peer L6_2 10.101.20.4:943"
> 
> Master listen on peer? really?
> on each reload (kill -USR2)  on more LISTEN appears
> 

I can reproduce easily the problem, the peers listener is not closed since I 
removed
the deinit from the master-worker code, however that does not explain why the 
old
worker does not quit.

-- 
William Lallemand



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-28 Thread William Lallemand
On Tue, Nov 28, 2017 at 02:56:55PM +0100, William Lallemand wrote:
> On Tue, Nov 28, 2017 at 12:22:04PM +0100, Emmanuel Hocdet wrote:
> > ok, i should have something strange because it’s easy to reproduce in my 
> > environnement.
> > 
> > When i look lsof i see on master:
> > haproxy 21355 root4u  IPv4 39007164  0t0  TCP 10.101.20.4:943 
> > (LISTEN)
> > it’s link to self ip configuration "peer L6_2 10.101.20.4:943"
> > 
> > Master listen on peer? really?
> > on each reload (kill -USR2)  on more LISTEN appears
> > 
> 
> I can reproduce easily the problem, the peers listener is not closed since I 
> removed
> the deinit from the master-worker code, however that does not explain why the 
> old
> worker does not quit.
> 

Apparently this is reproducible without the master-worker, upon a reload with a 
local peer,
the previous process doesn't leave.

-- 
William Lallemand



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-29 Thread Willy Tarreau
On Tue, Nov 28, 2017 at 05:08:53PM +0100, William Lallemand wrote:
> On Tue, Nov 28, 2017 at 02:56:55PM +0100, William Lallemand wrote:
> > On Tue, Nov 28, 2017 at 12:22:04PM +0100, Emmanuel Hocdet wrote:
> > > ok, i should have something strange because it's easy to reproduce in my 
> > > environnement.
> > > 
> > > When i look lsof i see on master:
> > > haproxy 21355 root4u  IPv4 39007164  0t0  TCP 10.101.20.4:943 
> > > (LISTEN)
> > > it's link to self ip configuration "peer L6_2 10.101.20.4:943"
> > > 
> > > Master listen on peer? really?
> > > on each reload (kill -USR2)  on more LISTEN appears
> > > 
> > 
> > I can reproduce easily the problem, the peers listener is not closed since 
> > I removed
> > the deinit from the master-worker code, however that does not explain why 
> > the old
> > worker does not quit.
> > 
> 
> Apparently this is reproducible without the master-worker, upon a reload with 
> a local peer,
> the previous process doesn't leave.

I can now reproduce it. It happens that my previous failed tests didn't
trigger it because no stick table was referencing the peers.

I could bisect it to this commit from a well-known bug author :

  commit 3e13cbafe2612dc026494d90ce8604f08cdaf58d
  Author: Willy Tarreau 
  Date:   Sun Oct 8 11:26:30 2017 +0200

MEDIUM: session: make use of the connection's destroy callback

Now we don't remove the session when a stream dies, instead we
detach the stream and let the mux decide to release the connection
and call session_free() instead.

I don't immediately see how it can be related, but I suspect that it
results in peers connections not properly being closed.

Willy



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-29 Thread Willy Tarreau
On Wed, Nov 29, 2017 at 10:41:18AM +0100, Willy Tarreau wrote:
> > Apparently this is reproducible without the master-worker, upon a reload 
> > with a local peer,
> > the previous process doesn't leave.
> 
> I can now reproduce it. It happens that my previous failed tests didn't
> trigger it because no stick table was referencing the peers.
> 
> I could bisect it to this commit from a well-known bug author :
> 
>   commit 3e13cbafe2612dc026494d90ce8604f08cdaf58d
>   Author: Willy Tarreau 
>   Date:   Sun Oct 8 11:26:30 2017 +0200
> 
> MEDIUM: session: make use of the connection's destroy callback
> 
> Now we don't remove the session when a stream dies, instead we
> detach the stream and let the mux decide to release the connection
> and call session_free() instead.
> 
> I don't immediately see how it can be related, but I suspect that it
> results in peers connections not properly being closed.

I understand now. It's embarrassing. Sessions are released only when
instanciated from a connection but never from an applet. So indeed we
keep a non-null job count which prevents the old process from quitting.

I'w working on this now.

Willy



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-29 Thread Willy Tarreau
On Wed, Nov 29, 2017 at 11:54:28AM +0100, Willy Tarreau wrote:
> On Wed, Nov 29, 2017 at 10:41:18AM +0100, Willy Tarreau wrote:
> > > Apparently this is reproducible without the master-worker, upon a reload 
> > > with a local peer,
> > > the previous process doesn't leave.
> > 
> > I can now reproduce it. It happens that my previous failed tests didn't
> > trigger it because no stick table was referencing the peers.
> > 
> > I could bisect it to this commit from a well-known bug author :
> > 
> >   commit 3e13cbafe2612dc026494d90ce8604f08cdaf58d
> >   Author: Willy Tarreau 
> >   Date:   Sun Oct 8 11:26:30 2017 +0200
> > 
> > MEDIUM: session: make use of the connection's destroy callback
> > 
> > Now we don't remove the session when a stream dies, instead we
> > detach the stream and let the mux decide to release the connection
> > and call session_free() instead.
> > 
> > I don't immediately see how it can be related, but I suspect that it
> > results in peers connections not properly being closed.
> 
> I understand now. It's embarrassing. Sessions are released only when
> instanciated from a connection but never from an applet. So indeed we
> keep a non-null job count which prevents the old process from quitting.

Now fixed in both mainline and 1.8. Thanks for the report Manu!
Willy



Re: [BUG] haproxy 1.8-last/master-worker/peers

2017-11-29 Thread Willy Tarreau
On Wed, Nov 29, 2017 at 02:29:56PM +0100, Willy Tarreau wrote:
> On Wed, Nov 29, 2017 at 11:54:28AM +0100, Willy Tarreau wrote:
> > On Wed, Nov 29, 2017 at 10:41:18AM +0100, Willy Tarreau wrote:
> > > > Apparently this is reproducible without the master-worker, upon a 
> > > > reload with a local peer,
> > > > the previous process doesn't leave.
> > > 
> > > I can now reproduce it. It happens that my previous failed tests didn't
> > > trigger it because no stick table was referencing the peers.
> > > 
> > > I could bisect it to this commit from a well-known bug author :
> > > 
> > >   commit 3e13cbafe2612dc026494d90ce8604f08cdaf58d
> > >   Author: Willy Tarreau 
> > >   Date:   Sun Oct 8 11:26:30 2017 +0200
> > > 
> > > MEDIUM: session: make use of the connection's destroy callback
> > > 
> > > Now we don't remove the session when a stream dies, instead we
> > > detach the stream and let the mux decide to release the connection
> > > and call session_free() instead.
> > > 
> > > I don't immediately see how it can be related, but I suspect that it
> > > results in peers connections not properly being closed.
> > 
> > I understand now. It's embarrassing. Sessions are released only when
> > instanciated from a connection but never from an applet. So indeed we
> > keep a non-null job count which prevents the old process from quitting.
> 
> Now fixed in both mainline and 1.8. Thanks for the report Manu!

And now seeing it again when there are data to synchronize :-(

Willy