Re: Lua: forcing garbage collector after socket i/o

2020-01-22 Thread Dave Chiluk
We are running this patch on top of 1.9.13 where it is needed. I will
report back if/when we have anything to add.  Untill then consider no
news as good news in regards to this.

Dave.

On Tue, Jan 14, 2020 at 9:37 AM Willy Tarreau  wrote:
>
> On Tue, Jan 14, 2020 at 09:31:07AM -0600, Dave Chiluk wrote:
> > Can we get this backported onto the 2.0 and 1.9 stable streams?  It
> > looks like it mostly cleanly patches. *(aside from line numbers).
>
> Given that the risk of regression is far from zero (hence the tag "medium"),
> I'd rather avoid for a while and observe instead. Very few users will notice
> an improvement, maybe only two, but every Lua user would have to accept the
> risk of a possible regression, so care is mandatory. We'd do it to 2.1 first,
> and after a few releases possibly to 2.0 if there is compelling demand for
> this. By then 1.9 will likely be dead anyway.
>
> If you're facing a high enough Lua-based connection rate that would make this
> a nice improvement to the point where you'd be taking the risk to use the
> backport, I think everyone would very much appreciate that you run with this
> patch for a while to help confirm it doesn't break anything.
>
> Thanks,
> Willy



Re: Lua: forcing garbage collector after socket i/o

2020-01-14 Thread Dave Chiluk
Can we get this backported onto the 2.0 and 1.9 stable streams?  It
looks like it mostly cleanly patches. *(aside from line numbers).

Thanks,
Dave

On Tue, Jan 14, 2020 at 3:49 AM Willy Tarreau  wrote:
>
> On Mon, Jan 13, 2020 at 10:11:57AM -0800, Sadasiva Gujjarlapudi wrote:
> > Sounds good to me.
> > Thank you so much once again.
>
> OK now merged. Thanks guys!
>
> Willy
>



Re: Haproxy nbthreads + multi-threading lua?

2019-12-13 Thread Dave Chiluk
After a bit more research I discovered that the lua scripts are
actually from signal sciences.

You should have a conversation with Signal Sciences, and how they are
doing ingress capture that goes through Haproxy.
https://docs.signalsciences.net/install-guides/other-modules/haproxy-module/

Dave.
p.s. Yes we did meet at KubeCon, and I really appreciated your
suggestions on healthchecking.  I just haven't had a chance to
check/test them because of higher priority issues that have arisen
*(isn't this always the case).  And no this isn't even one fo those
higher priority issues.

On Wed, Dec 11, 2019 at 2:35 AM Baptiste  wrote:
>
> On Mon, Dec 2, 2019 at 5:15 PM Dave Chiluk  wrote:
>>
>> Since 2.0 nbproc and nbthreads are now mutually exclusive, are there
>> any ways to make lua multi-threaded?
>>
>> One of our proxy's makes heavy use of lua scripting.  I'm not sure if
>> this is still the case, but in earlier versions of HAProxy lua was
>> single threaded per process.  Because of this we were running that
>> proxy with nbproc=4, and nbthread=4. This allowed us to scale without
>> being limited by lua.
>>
>> Has lua single-threaded-ness now been solved?  Are there other options
>> I should be aware of related to that?  What's the preferred way around
>> this?
>>
>> Thanks,
>> Dave.
>>
>
> Hi Dave,
> (I think we met at kubecon)
>
> What's your use case for Lua exactly?
> Can't it be replaced by SPOE at some point? (which is compatible with 
> nbthread and can run heavy processing outside of the HAProxy process)?
>
> You can answer me privately if you don't want such info to be public.
>
> Baptiste



Haproxy nbthreads + multi-threading lua?

2019-12-02 Thread Dave Chiluk
Since 2.0 nbproc and nbthreads are now mutually exclusive, are there
any ways to make lua multi-threaded?

One of our proxy's makes heavy use of lua scripting.  I'm not sure if
this is still the case, but in earlier versions of HAProxy lua was
single threaded per process.  Because of this we were running that
proxy with nbproc=4, and nbthread=4. This allowed us to scale without
being limited by lua.

Has lua single-threaded-ness now been solved?  Are there other options
I should be aware of related to that?  What's the preferred way around
this?

Thanks,
Dave.



Re: Status of 1.5 ?

2019-11-26 Thread Dave Chiluk
Ubuntu 16.04 is on 1.6 which is bug-fix "supported" till 2021.  It's
probably fine to deprecate next year.
Ubuntu 18.04 is on 1.8 which is bug-fix "supported" till 2023.

Debian has 1.8 in their stable and 2.0.9 in unstable, but I'm not as
familiar with their release cycles.
RHEL/Centos 7 haproxy package is on 1.5, but they've also provided a
rh-haproxy18 which provides 1.8.

AFAICT from a distro perspective you are pretty good to kill off 1.5.

Dave.
FYI, I'm an Ubuntu Dev if you ever need one.

On Tue, Nov 26, 2019 at 7:00 AM Willy Tarreau  wrote:
>
> Hi Vincent,
>
> On Tue, Nov 26, 2019 at 01:33:30PM +0100, Vincent Bernat wrote:
> >  ? 25 octobre 2019 11:27 +02, Willy Tarreau :
> >
> > > Now I'm wondering, is anyone interested in this branch to still be
> > > maintained ? Should I emit a new release with a few pending fixes
> > > just to flush the pipe and pursue its "critical fixes only" status a
> > > bit further, or should we simply declare it unmaintained ? I'm fine
> > > with either option, it's just that I hate working for no reason, and
> > > this version was released a bit more than 5 years ago now, so I can
> > > easily expect that it has few to no user by now.
> > >
> > > Please just let me know what you think,
> >
> > What's the conclusion? :)
>
> Oh you're right, I wanted to mention it yesterday but the e-mail delivery
> issues derailed my focus a bit...
>
> So it looks like the most reasonable thing to do is to drop it at the end
> of this year, or exactly 3 years after the last update to the branch! I
> don't expect it to require any new fix at all to be honest. Those using
> it for SSL should really upgrade to something more recent, at least to
> benefit from more recent openssl versions (1.0.1 was probably the last
> supported one) and those who don't need SSL likely didn't even upgrade
> to 1.5 anyway ;-)
>
> So we could say that if anything really critical must happen to 1.5, it
> must happen within one month for it to get a fix and after that it's too
> late.
>
> Cheers,
> Willy
>



Re: Increase in sockets in TIME_WAIT with 1.9.x

2019-06-13 Thread Dave Chiluk
I was able to bisect this down to 53216e7 being the problematic commit,
when using calls to setsockopt(... SO_LINGER ...) as the test metric.

I used the number of calls to setsockopt with SO_LINGER in them using the
following command.
$ sudo timeout 60s strace -e setsockopt,close -p $(ps -lf -C haproxy | tail
-n 1 | awk -e '{print $4}') 2>&1 | tee 1.9-${V} ; grep LINGER 1.9-${V} | wc
-l

53216e7 = 1
81a15af6b = 69

Interesting to note is that 1.8.17 only has roughly 17.  I'll see if I can
do a bisection for that tomorrow.  Hope that helps.

Dave.

On Thu, Jun 13, 2019 at 3:30 PM Willy Tarreau  wrote:

> On Thu, Jun 13, 2019 at 03:20:20PM -0500, Dave Chiluk wrote:
> > I've attached an haproxy.cfg that is as minimal as I felt comfortable.
> (...)
>
> many thanks for this, Dave, I truly appreciate it. I'll have a look at
> it hopefully tomorrow morning.
>
> Willy
>


Re: Increase in sockets in TIME_WAIT with 1.9.x

2019-06-13 Thread Dave Chiluk
I've attached an haproxy.cfg that is as minimal as I felt comfortable.  We
are using admin sockets for dynamic configuration of backends so left the
server-templating in, but no other application was configured to
orchestrate haproxy at the time of testing.

I've also attached output from
$ sudo timeout 60s strace -e setsockopt,close -p $(ps -lf -C haproxy | tail
-n 1 | awk -e '{print $4}') 2>&1 | tee 1.8.17

Which shows the significant decrease in setting of SO_LINGER.  I guess I
lied earlier when I said there were none, but over 60s it looks like I
1.9.8 had 1/17 the number of SO_LINGER setsockopt calls vs 1.8.17.

Unfortunately the number of sockets sitting in TIME_WAIT fluctuates to the
point where there's not a great metric to use.  Looking at SO_LINGER
settings does appear to be consistent though.  I bet if I spawned 700
backend instances instead of 7 it would be more pronounced.

I got perf stack traces for setsockopt from both versions on our production
servers, but inlining made those traces mostly useless.

Let me know if there's anything else i can grab.

Dave.


On Thu, Jun 13, 2019 at 1:30 AM Willy Tarreau  wrote:

> On Wed, Jun 12, 2019 at 12:08:03PM -0500, Dave Chiluk wrote:
> > I did a bit more introspection on our TIME_WAIT connections.  The
> increase
> > in sockets in TIME_WAIT is definitely from old connections to our backend
> >  server instances.  Considering the fact that this server is doesn't
> > actually serve real traffic we can make a reasonable assumptions that
> this
> > is almost entirely due to increases in healthchecks.
>
> Great!
>
> > Doing an strace on haproxy 1.8.17 we see
> > 
> > sudo strace -e setsockopt,close -p 15743
> > strace: Process 15743 attached
> > setsockopt(17, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> > setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
> > close(17)   = 0
> > 
> >
> > Doing the same strace on 1.9.8 we see
> > 
> > sudo strace -e setsockopt,close -p 6670
> > strace: Process 6670 attached
> > setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> > close(4)= 0
> > 
> >
> > The calls to setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0},
> 8)
> > = 0
> > appear to be missing.
>
> Awesome, that's exactly the info I was missing. I suspected that for
> whatever reason the lingering was not disabled, at least now we have
> a proof of this! Now the trick is to figure why :-/
>
> > We are running centos 7 with kernel 3.10.0-957.1.3.el7.x86_64.
>
> OK, and with the setsockopt it should behave properly.
>
> > I'll keep digging into this, and see if I can get stack traces that
> result
> > in teh setsockopt calls on 1.8.17 so the stack can be more closely
> > inspected.
>
> Don't worry for this now, this is something we at least need to resolve
> before issuing 2.0 or it will cause some trouble. Then we'll backport the
> fix once the cause is figured out.
>
> However when I try here I don't have the problem, either in 1.9.8 or
> 2.0-dev7 :
>
> 08:27:30.212570 connect(14, {sa_family=AF_INET, sin_port=htons(9003),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> 08:27:30.212590 recvfrom(14, NULL, 2147483647,
> MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> 08:27:30.212610 setsockopt(14, SOL_SOCKET, SO_LINGER, {l_onoff=1,
> l_linger=0}, 8) = 0
> 08:27:30.212630 close(14)   = 0
> 08:27:30.212659 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0,
> tv_nsec=6993282}) = 0
>
> So it must depend on the type of check. Could you please share a
> minimalistic
> config that reproduces this ?
>
> Thanks,
> Willy
>


1.8.17
Description: Binary data


domain_map
Description: Binary data


1.9.8
Description: Binary data


haproxy.cfg
Description: Binary data


Re: Increase in sockets in TIME_WAIT with 1.9.x

2019-06-12 Thread Dave Chiluk
I did a bit more introspection on our TIME_WAIT connections.  The increase
in sockets in TIME_WAIT is definitely from old connections to our backend
 server instances.  Considering the fact that this server is doesn't
actually serve real traffic we can make a reasonable assumptions that this
is almost entirely due to increases in healthchecks.

Doing an strace on haproxy 1.8.17 we see

sudo strace -e setsockopt,close -p 15743
strace: Process 15743 attached
setsockopt(17, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
close(17)   = 0


Doing the same strace on 1.9.8 we see

sudo strace -e setsockopt,close -p 6670
strace: Process 6670 attached
setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
close(4)= 0


The calls to setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8)
= 0
appear to be missing.

We are running centos 7 with kernel 3.10.0-957.1.3.el7.x86_64.

I'll keep digging into this, and see if I can get stack traces that result
in teh setsockopt calls on 1.8.17 so the stack can be more closely
inspected.

Thanks for any help,
Dave


On Tue, Jun 11, 2019 at 2:29 AM Willy Tarreau  wrote:

> On Mon, Jun 10, 2019 at 04:01:27PM -0500, Dave Chiluk wrote:
> > We are in the process of evaluating upgrading to 1.9.8 from 1.8.17,
> > and we are seeing a roughly 70% increase in sockets in TIME_WAIT on
> > our haproxy servers with a mostly idle server cluster
> > $ sudo netstat | grep 'TIME_WAIT' | wc -l
>
> Be careful, TIME_WAIT on the frontend is neither important nor
> representative of anything, only the backend counts.
>
> > Looking at the source/destination of this it seems likely that this
> > comes from healthchecks.  We also see a corresponding load increase on
> > the backend applications serving the healthchecks.
>
> It's very possible and problematic at the same time.
>
> > Checking the git logs for healthcheck was unfruitful.  Any clue what
> > might be going on?
>
> Normally we make lots of efforts to close health-check responses with
> a TCP RST (by disabling lingering before closing). I don't see why it
> wouldn't be done here. What OS are you running on and what do your
> health checks look like in the configuration ?
>
> Thanks,
> Willy
>


Increase in sockets in TIME_WAIT with 1.9.x

2019-06-10 Thread Dave Chiluk
We are in the process of evaluating upgrading to 1.9.8 from 1.8.17,
and we are seeing a roughly 70% increase in sockets in TIME_WAIT on
our haproxy servers with a mostly idle server cluster
$ sudo netstat | grep 'TIME_WAIT' | wc -l

Looking at the source/destination of this it seems likely that this
comes from healthchecks.  We also see a corresponding load increase on
the backend applications serving the healthchecks.

Checking the git logs for healthcheck was unfruitful.  Any clue what
might be going on?

Thanks,
Dave.



Re: What to look out for when going from 1.6 to 1.8?

2018-07-16 Thread Dave Chiluk
We have the same use case as Alex *(mesos load balancing), and also confirm
that our config worked without change 1.6->1.8.

Given our testing you should consider the seamless reload -x option, and
the dynamic server configuration apis.  Both have greatly alleviated issues
we've faced in our microservices-based cloud.

Dave.

On Mon, Jul 16, 2018 at 8:47 AM Alex Evonosky 
wrote:

> Tim-
>
> I can speak from a production point of view that we had HAproxy on the 1.6
> branch inside docker containers for mesos load balancing with pretty much
> the same requirements as you spoke of.  After compiling Haproxy to the 1.8x
> branch the same config worked without issues.
>
> -Alex
>
>
> On Mon, Jul 16, 2018 at 9:39 AM, Tim Verhoeven  > wrote:
>
>> Hello all,
>>
>> We have been running the 1.6 branch of HAProxy, without any issues, for a
>> while now. And reading the updates around 1.8 here in the mailing list it
>> looks like its time to upgrade to this branch.
>>
>> So I was wondering if there are any things I need to look of for when
>> doing this upgrade? We are not doing anything special with HAProxy (I
>> think). We run it as a single process, we use SSL/TLS termination, some
>> ACL's and a bunch of backends. We only use HTTP 1.1 and TCP connections.
>>
>> From what I've been able to gather my current config will works just as
>> good with 1.8. But some extra input from all the experts here is always
>> appreciated.
>>
>> Thanks,
>> Tim
>>
>
>


[PATCH] [MINOR] Some spelling cleanup in the comments.

2018-06-21 Thread Dave Chiluk
Signed-off-by: Dave Chiluk 
---
 include/common/cfgparse.h | 2 +-
 src/session.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/common/cfgparse.h b/include/common/cfgparse.h
index c003bd3b0..6e35bc948 100644
--- a/include/common/cfgparse.h
+++ b/include/common/cfgparse.h
@@ -92,7 +92,7 @@ int parse_process_number(const char *arg, unsigned long 
*proc, int *autoinc, cha
 
 /*
  * Sends a warning if proxy  does not have at least one of the
- * capabilities in . An optionnal  may be added at the end
+ * capabilities in . An optional  may be added at the end
  * of the warning to help the user. Returns 1 if a warning was emitted
  * or 0 if the condition is valid.
  */
diff --git a/src/session.c b/src/session.c
index c1bd2d6b5..ae2d9e1d9 100644
--- a/src/session.c
+++ b/src/session.c
@@ -114,11 +114,11 @@ static void session_count_new(struct session *sess)
 }
 
 /* This function is called from the protocol layer accept() in order to
- * instanciate a new session on behalf of a given listener and frontend. It
+ * instantiate a new session on behalf of a given listener and frontend. It
  * returns a positive value upon success, 0 if the connection can be ignored,
  * or a negative value upon critical failure. The accepted file descriptor is
  * closed if we return <= 0. If no handshake is needed, it immediately tries
- * to instanciate a new stream. The created connection's owner points to the
+ * to instantiate a new stream. The created connection's owner points to the
  * new session until the upper layers are created.
  */
 int session_accept_fd(struct listener *l, int cfd, struct sockaddr_storage 
*addr)
-- 
2.17.1




Re: [PATCH] [MINOR] Some spelling cleanup in comments.

2018-06-21 Thread Dave Chiluk
I'm sorry I just realized I applied this against the 1.8 stable.

I'll send another patch for 1.9.

On Thu, Jun 21, 2018 at 10:55 AM Dave Chiluk 
wrote:

> Some spelling cleanup in comments.
>
> Signed-off-by: Dave Chiluk 
> ---
>  include/common/cfgparse.h | 2 +-
>  include/types/task.h  | 2 +-
>  src/session.c | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/common/cfgparse.h b/include/common/cfgparse.h
> index c3355ca4..3022b8d8 100644
> --- a/include/common/cfgparse.h
> +++ b/include/common/cfgparse.h
> @@ -90,7 +90,7 @@ int parse_process_number(const char *arg, unsigned long
> *proc, int *autoinc, cha
>
>  /*
>   * Sends a warning if proxy  does not have at least one of the
> - * capabilities in . An optionnal  may be added at the end
> + * capabilities in . An optional  may be added at the end
>   * of the warning to help the user. Returns 1 if a warning was emitted
>   * or 0 if the condition is valid.
>   */
> diff --git a/include/types/task.h b/include/types/task.h
> index 991e3a46..ac8c4339 100644
> --- a/include/types/task.h
> +++ b/include/types/task.h
> @@ -64,7 +64,7 @@ struct notification {
>  struct task {
> struct eb32sc_node rq;  /* ebtree node used to hold the
> task in the run queue */
> unsigned short state;   /* task state : bit field of
> TASK_* */
> -   unsigned short pending_state;   /* pending states for running talk
> */
> +   unsigned short pending_state;   /* pending states for running task
> */
> short nice; /* the task's current nice value
> from -1024 to +1024 */
> unsigned int calls; /* number of times ->process() was
> called */
> struct task * (*process)(struct task *t);  /* the function which
> processes the task */
> diff --git a/src/session.c b/src/session.c
> index 318c1716..898dbaab 100644
> --- a/src/session.c
> +++ b/src/session.c
> @@ -114,11 +114,11 @@ static void session_count_new(struct session *sess)
>  }
>
>  /* This function is called from the protocol layer accept() in order to
> - * instanciate a new session on behalf of a given listener and frontend.
> It
> + * instantiate a new session on behalf of a given listener and frontend.
> It
>   * returns a positive value upon success, 0 if the connection can be
> ignored,
>   * or a negative value upon critical failure. The accepted file
> descriptor is
>   * closed if we return <= 0. If no handshake is needed, it immediately
> tries
> - * to instanciate a new stream. The created connection's owner points to
> the
> + * to instantiate a new stream. The created connection's owner points to
> the
>   * new session until the upper layers are created.
>   */
>  int session_accept_fd(struct listener *l, int cfd, struct
> sockaddr_storage *addr)
> --
> 2.17.1
>
>


[PATCH] [MINOR] Some spelling cleanup in comments.

2018-06-21 Thread Dave Chiluk
Some spelling cleanup in comments.

Signed-off-by: Dave Chiluk 
---
 include/common/cfgparse.h | 2 +-
 include/types/task.h  | 2 +-
 src/session.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/common/cfgparse.h b/include/common/cfgparse.h
index c3355ca4..3022b8d8 100644
--- a/include/common/cfgparse.h
+++ b/include/common/cfgparse.h
@@ -90,7 +90,7 @@ int parse_process_number(const char *arg, unsigned long 
*proc, int *autoinc, cha
 
 /*
  * Sends a warning if proxy  does not have at least one of the
- * capabilities in . An optionnal  may be added at the end
+ * capabilities in . An optional  may be added at the end
  * of the warning to help the user. Returns 1 if a warning was emitted
  * or 0 if the condition is valid.
  */
diff --git a/include/types/task.h b/include/types/task.h
index 991e3a46..ac8c4339 100644
--- a/include/types/task.h
+++ b/include/types/task.h
@@ -64,7 +64,7 @@ struct notification {
 struct task {
struct eb32sc_node rq;  /* ebtree node used to hold the task in 
the run queue */
unsigned short state;   /* task state : bit field of TASK_* */
-   unsigned short pending_state;   /* pending states for running talk */
+   unsigned short pending_state;   /* pending states for running task */
short nice; /* the task's current nice value from 
-1024 to +1024 */
unsigned int calls; /* number of times ->process() was 
called */
struct task * (*process)(struct task *t);  /* the function which 
processes the task */
diff --git a/src/session.c b/src/session.c
index 318c1716..898dbaab 100644
--- a/src/session.c
+++ b/src/session.c
@@ -114,11 +114,11 @@ static void session_count_new(struct session *sess)
 }
 
 /* This function is called from the protocol layer accept() in order to
- * instanciate a new session on behalf of a given listener and frontend. It
+ * instantiate a new session on behalf of a given listener and frontend. It
  * returns a positive value upon success, 0 if the connection can be ignored,
  * or a negative value upon critical failure. The accepted file descriptor is
  * closed if we return <= 0. If no handshake is needed, it immediately tries
- * to instanciate a new stream. The created connection's owner points to the
+ * to instantiate a new stream. The created connection's owner points to the
  * new session until the upper layers are created.
  */
 int session_accept_fd(struct listener *l, int cfd, struct sockaddr_storage 
*addr)
-- 
2.17.1




Re: Truly seamless reloads

2018-06-01 Thread Dave Chiluk
The patches are all cherry picks from the 1.8 branch that i backported to
the 1.7 branch.  They are all documented with the original development tree
sha as well.

Have fun
Dave.

On Fri, Jun 1, 2018, 6:16 AM Veiko Kukk  wrote:

> On 31/05/18 23:15, William Lallemand wrote:
> > Sorry but unfortunately we are not backporting features in stable
> branches,
> > those are only meant for maintenance.
> >
> > People who want to use the seamless reload should migrate to HAProxy
> 1.8, the
> > stable team won't support this feature in previous branches.
>
>
> I've been keeping eye on this list about 1.8 related bugs and it does
> not seem to me that 1.8 stable enough yet for production use. Too many
> reports about high CPU usage and/or crashes.
> We are still using 1.6 which finally seems to have stabilized enough for
> production. When we started using 1.6 some years ago, we had many issues
> with it which caused service interruptions. Would not want to repeat
> that again.
>
> Even with 1.7, processes would hang forever after reload (days,
> sometimes weeks or until reboot). Really hard to debug, happens only
> under production load.
>
> I will look at patches provided by Dave. We are building HAproxy rpm-s
> for ourselves anyway, applying some patches in spec file does not seem
> to be that much additional work if indeed those would provide truly
> seamless reloads.
>
> Best regards,
> Veiko
>
>


Re: Truly seamless reloads

2018-05-29 Thread Dave Chiluk
I backported the necessary patchset for seamless reloads on top of 1.7.9 a
while back.  It was used in production without issue for quite some time.

I just rebased those patches on top of haproxy-1.7 development and pushed
the result to seamless_reload branch that I pushed to github.  They apply
cleanly, but I have not built or tested them, nor do I have the time to do
so at the moment.
https://github.com/chiluk/haproxy-1.7

I'm also attached the patchset for completeness.  Happy reloading.

I think the 1.7 maintainer should pick these patches up, as the hard work
has already been done.

Dave.



On Mon, Apr 30, 2018 at 4:26 AM William Lallemand 
wrote:

> On Mon, Apr 30, 2018 at 10:35:37AM +0300, Veiko Kukk wrote:
> > On 26/04/18 17:11, Veiko Kukk wrote:
> > > Hi,
> > >
> > > According to
> > >
> https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/
> > > :
> > >
> > > "The patchset has already been merged into the HAProxy 1.8 development
> > > branch and will soon be backported to HAProxy Enterprise Edition 1.7r1
> > > and possibly 1.6r2."
> > >
> > > Has it been backported to 1.7 and/or 1.6?
> > >
> > > If yes, then should seamless reload also work with multiprocess
> > > configurations? (nbproc > 1).
> >
> > Can i assume the answer is no for both questions?
> >
> >
> > Veiko
> >
>
> Hello Veiko,
>
> Indeed, the seamless reload is only available since HAProxy 1.8.
>
> It supports multiprocess configuration.
>
>
> --
> William Lallemand
>
>
From cd0e6748ad7ce13ff9db07b7e32e56a0c77f1afe Mon Sep 17 00:00:00 2001
From: William Lallemand 
Date: Fri, 26 May 2017 18:19:55 +0200
Subject: [PATCH 10/10] MEDIUM: proxy: zombify proxies only when the expose-fd
 socket is bound

When HAProxy is running with multiple processes and some listeners
arebound to processes, the unused sockets were not closed in the other
processes. The aim was to be able to send those listening sockets using
the -x option.

However to ensure the previous behavior which was to close those
sockets, we provided the "no-unused-socket" global option.

This patch changes this behavior, it will close unused sockets which are
not in the same process as an expose-fd socket, making the
"no-unused-socket" option useless.

The "no-unused-socket" option was removed in this patch.

(cherry picked from commit 7f80eb2383bb54ddafecf0e7df6b3b3ef4b4f6e5)
Signed-off-by: Dave Chiluk 
---
 doc/configuration.txt |  7 ---
 src/cfgparse.c|  5 -
 src/haproxy.c | 19 ++-
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 9bb9cb9..980b253 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -587,7 +587,6 @@ The following keywords are supported in the "global" section :
- nosplice
- nogetaddrinfo
- noreuseport
-   - no-unused-socket
- spread-checks
- server-state-base
- server-state-file
@@ -1250,12 +1249,6 @@ noreuseport
   Disables the use of SO_REUSEPORT - see socket(7). It is equivalent to the
   command line argument "-dR".
 
-no-unused-socket
-  By default, each haproxy process keeps all sockets opened, event those that
-  are only used by another processes, so that any process can provide all the
-  sockets, to make reloads seamless. This option disables this, and close all
-  unused sockets, to save some file descriptors.
-
 spread-checks <0..50, in percent>
   Sometimes it is desirable to avoid sending agent and health checks to
   servers at exact intervals, for instance when many logical servers are
diff --git a/src/cfgparse.c b/src/cfgparse.c
index be21088..8c0906b 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -671,11 +671,6 @@ int cfg_parse_global(const char *file, int linenum, char **args, int kwm)
 			goto out;
 		global.tune.options &= ~GTUNE_USE_REUSEPORT;
 	}
-	else if (!strcmp(args[0], "no-unused-socket")) {
-		if (alertif_too_many_args(0, file, linenum, args, _code))
-			goto out;
-		global.tune.options &= ~GTUNE_SOCKET_TRANSFER;
-	}
 	else if (!strcmp(args[0], "quiet")) {
 		if (alertif_too_many_args(0, file, linenum, args, _code))
 			goto out;
diff --git a/src/haproxy.c b/src/haproxy.c
index 2091573..f7605e0 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -975,7 +975,6 @@ void init(int argc, char **argv)
 #if defined(SO_REUSEPORT)
 	global.tune.options |= GTUNE_USE_REUSEPORT;
 #endif
-	global.tune.options |= GTUNE_SOCKET_TRANSFER;
 
 	pid = getpid();
 	progname = *argv;
@@ -2306,6 +2305,24 @@ int main(int argc, char **argv)
 			exit(0); /* parent must leave */
 		}
 
+		/* pass through every cli socket, and check if it's bound to
+		 * the current process and if it ex

Re: remaining process after (seamless) reload

2018-05-29 Thread Dave Chiluk
We've battled the same issue with our haproxys.  We root caused it to slow
dns lookup times while parsing the config was causing haproxy config
parsing to take so long that we were attempting to reload again before the
original reload had completed.  I'm still not sure why or where the Signals
are getting dropped to the old haproxy, but we found by installing a dns
cache on our haproxy nodes we were able to greatly decrease the likelihood
of creating zombie haproxy instances.

We further improved on that by rearchitecting our micro-service
architecture to make use of the haproxy dynamic scaling apis, and
allocating dummy slots for future expansion.  Similar to
https://www.haproxy.com/blog/dynamic-scaling-for-microservices-with-runtime-api/
.

Good luck, I hope that's the answer to your problem.
Dave.

On Tue, May 29, 2018 at 10:12 AM William Dauchy  wrote:

> Hello William,
>
> Sorry for the last answer.
>
> > Are the problematical workers leaving when you reload a second time?
>
> no, they seems to stay for a long time (forever?)
>
> > Did you try to kill -USR1 the worker ? It should exits with "Former
> worker $PID
> > exited with code 0" on stderr.
> > If not, could you check the Sig* lines in /proc/$PID/status for this
> worker?
>
> will try. I need to put the setup back in shape, and maybe test
> without multiple binding.
>
> > Do you know how much time take haproxy to load its configuration, and do
> you
> > think you tried a reload before it finished to parse and load the config?
> > Type=notify in your systemd unit file should help for this case. If I
> remember
> > well it checks that the service is 'ready' before trying to reload.
>
> We are using Type=notify. I however cannot guarantee we do not trigger
> a new reload, before the previous one is done. Is there a way to check
> the "ready" state you mentioned?
> (We are talking about a reload every 10 seconds maximum though)
>
> > I suspect the SIGUSR1 signal is not received by the worker, but I'm not
> sure
> > either if it's the master that didn't send it or if the worker blocked
> it.
>
> good to know.
>
> Best,
> --
> William
>
>


Re: haproxy startup at boot too quick

2018-05-14 Thread Dave Chiluk
Assuming you are running an Ubuntu archive version of haproxy you should
consider opening a bug in launchpad as well.
https://launchpad.net/ubuntu/+source/haproxy/+filebug

It sounds like there's a missing dependency in the unit file against DNS or
network, but I haven't looked into it other than what you've mentioned here.

Dave.

On Mon, May 7, 2018 at 7:57 PM Bill Waggoner  wrote:

> On Mon, May 7, 2018 at 8:44 PM Kevin Decherf  wrote:
>
>> Hello,
>>
>> On 8 May 2018 02:32:01 CEST, Bill Waggoner  wrote:
>>
>> >Anyway, when the system boots haproxy fails to start. Unfortunately I
>> >forgot to save the systemctl status message but the impression I get is
>> >that it's starting too soon.
>>
>> You can find all past logs of your service using `journalctl -u
>> haproxy.service`. If journal persistence is off you'll not be able to look
>> at logs sent before the last boot.
>>
>>
>> --
>> Sent from my mobile. Please excuse my brevity.
>>
>
> Thank you, that was very helpful. I am new to systemd so please forgive my
> lack of knowledge.
>
> Looking at the messages it looks like one server was failing to start.
> That one happens to have a name instead of a static address in the server
> definition. My guess is that DNS isn't available yet when haproxy was
> starting and the retries are so quick that it didn't have time to recover.
>
> I'll simply change that to a literal IP address as all the others are.
>
> Thanks!
>
> Bill Waggoner
> --
> Bill Waggoner
> ad...@greybeard.org
> {Even Old Dogs can learn new tricks!}
>


Re: Health Checks not run before attempting to use backend

2018-04-13 Thread Dave Chiluk
Well after having read your thread that's disappointing.  An alternative
solution to forcing healthchecks before the bind It would be nice to have
an option to initially start all servers in the down state unless
explicitly loaded as up via a "show servers state
/load-server-state-from-file" option.

Additionally, in a "seamless reload" configuration as we are using, would
it be possible for the new haproxy to complete a healthcheck on backends
after it has bound to the socket, but before it has signaled the old
haproxy, or am I missing another gotcha there.

Also we are doing all this using 1.8.7.

Thanks,
Dave



On Fri, Apr 13, 2018 at 12:35 PM Jonathan Matthews <cont...@jpluscplusm.com>
wrote:

> On Fri, 13 Apr 2018 at 00:01, Dave Chiluk <chiluk+hapr...@indeed.com>
> wrote:
>
>> Is there a way to force haproxy to not use a backend until it passes a
>> healthcheck?  I'm also worried about the side affects this might cause as
>> requests start to queue up in the haproxy
>>
>
> I asked about this in 2014 ("Current solutions to the
> soft-restart-healthcheck-spread problem?") and I don't recall seeing a fix
> since then. Very interested in whatever you find out!
>
>
> J
>
>> --
> Jonathan Matthews
> London, UK
> http://www.jpluscplusm.com/contact.html
>


Health Checks not run before attempting to use backend

2018-04-12 Thread Dave Chiluk
Hi we're evaluating haproxy for use as the load balancer in front of our
mesos cluster.  What we are finding is that even though we have requested
the check option in the server line, haproxy attempts to serve traffic to
the server on startup until the first healthcheck completes.

server slot1 10.40.40.2:7070 check inter 1000 rise 3 fall 3 maxconn 32

This is because we are adding servers to haproxy as they are started in
mesos, but before our backend application itself is ready to serve
connections.  This results in spurious 503's being handed to clients as we
add backends via the admin socket or haproxy restart.  I looked into
possibly forcing a healthcheck during the cfgparse constructors, but that
seems like it would require some significant rearchitecting.

Is there a way to force haproxy to not use a backend until it passes a
healthcheck?  I'm also worried about the side affects this might cause as
requests start to queue up in the haproxy.

Thanks,
Dave


Seamless reloads and init scripts, and nbproc > 1

2017-09-07 Thread Dave Chiluk
I'm trying to write what amounts to an init/startup script for haproxy with
a patched version of 1.7.8 that includes the seamless reload patches that
are described on this blog post.
https://www.haproxy.com/blog/truly-seamless-reloads-with-
haproxy-no-more-hacks/

#1. If haproxy dies or was killed for some reason the stats socket still
exists, and when you try to relaunch haproxy with the -x option you get
[ALERT] 249/165956 (2750) : Failed to get the sockets from the old process!

It's not impossible, but it's pretty messy to determine if the stats socket
has a valid old process listening on it when trying to relaunch/reload
haproxy.  Is there a solution for this that I'm not seeing?  Otherwise when
you first launch haproxy you have to do so without the -x and then later
have to conditionally include it, and then check to see if you succeeded.

Here's an excerpt from a bash init script as an example of the pain I'm
going through.
unset RELOADSOCK
if [ -e "${STATSFILE}" ] ; then
RELOADSOCK="-x ${STATSFILE}"
sudo -u haproxy -g haproxy haproxy -f $HAPROXY_CONFIG_FILE $RELOADSOCK
-p $HAPROXY_PID_FILE -sf $(cat $HAPROXY_PID_FILE)
if [ $? == 1 ] ; then
# We likely had difficulty reading the stats file.  Delete it and
run normally.
rm ${STATSFILE}
sudo -u haproxy -g haproxy haproxy -f $HAPROXY_CONFIG_FILE -p
$HAPROXY_PID_FILE -sf $(cat $HAPROXY_PID_FILE)
fi
else
sudo -u haproxy -g haproxy haproxy -f $HAPROXY_CONFIG_FILE $RELOADSOCK
-p $HAPROXY_PID_FILE -sf $(cat $HAPROXY_PID_FILE)
fi

Other than that, I have seen no ill effects yet when using the -x for
passing, and I can confirm that it has resolved some disconnects.

Thanks,
Dave.
p.s. The above script is not for Ubuntu, but for my day job.