maxsslconn vs maxsslrate

2018-06-06 Thread Mihir Shirali
Hi Team,

We use haproxy to front tls for a large number of endpoints, haproxy
prcesses the TLS session and then forwards the request to the backend
application.
What we have noticed is that if there are a large number of connections
from different clients - the CPU usage goes up significantly. This
primarily because haproxy is handling a lot ofSSL connections. I came
across 2 options above and tested them out.

With maxsslrate - CPU is better controlled and if I combine this with 503
response in the front end I see great results. Is there a possibility of
connection timeout on the client here if there are a very large number of
requests?

With maxsslconn, CPU is still pegged high - and clients receive a tcp
reset. This is also good, because there is no chance of tcp time out on the
client. Clients can retry after a bit and they are aware that the
connection is closed instead of waiting on timeout. However, CPU still
seems pegged high. What is the reason for high CPU on the server here - Is
it because SSL stack is still hit with this setting?

-- 
Regards,
Mihir


Re: Observations about reloads and DNS SRV records

2018-06-06 Thread Baptiste
Hi Tait

A few comments inline:

1. Reloading with SRV records ignores server-state-file
> - While this is not a huge deal, it does mean that the backend in
> question becomes unavailable when the proxy is reloaded until the SRV and
> subsequent A records are resolved
>
- I understand that the point of these records is to dynamically build
> the backend, and populating servers from outdated and potentially invalid
> information is not ideal, but it might be nice to seed the backend before
> the first SRV query returns
>

This should not happen and it's a known issue that we're working on.



> - Below is the related config since it is very likely that I have
> misconfigured something :)
>
> globals
> ...
> server-state-base /var/lib/haproxy
> server-state-file state
>
> defaults
> ...
> load-server-state-from-file global
> default-server init-addr last,libc,none
>
> ...
>
> resolvers sd
> nameserver sd 127.0.0.1:
>
> backend dynamic
> server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd
> resolve-prefer ipv4 check
>
> 2. Additional record responses from the nameserver are not parsed
>

This is true, by design.


> - This just means that any servers that are populated from the SRV
> records require a second round of querying for each of the hosts after the
> fqdn is stored. It might be more efficient if these records are also parsed
> but I can see that it might be pretty challenging in the current DNS
> resolver
> - Only reason I thought of this was to try and reduce up the time it
> takes to populate the backend servers with addresses in an effort to lessen
> the effects of #1
>

Actually, I tested many DNS server and some of them simply did not send the
additional records when they could not fit in the response (too small
payload for the number of SRV records).
Technically, we could try to use additional records if available and then
failover to current way of working if none found.



> I'm happy with the workaround I'll be pursing for now where my SD service
> (that originally was going to be a resolver and populate via SRV records)
> is going to write all the backend definitions to disk so this is not a
> pressing issue, just thought I'd share the limitations I discovered. My
> knowledge of C (and the internal workings of HAproxy) is not great
> otherwise this would probably be a patch submission for #1 :)
>
> Tait
>
>
I'll check that for you. (In the mean time, please keep on answering to
Aleksandar emails, the more info I'll have, the best).

Baptiste


Re: [PATCH 1/2] MEDIUM: add set-priority-class and set-priority-offset

2018-06-06 Thread Patrick Hemmer


On 2018/5/31 00:57, Willy Tarreau wrote:
> Hi Patrick,
>
> On Thu, May 31, 2018 at 12:16:27AM -0400, Patrick Hemmer wrote:
>>> I looked at the code to see if something could cause that. I found that the
>>> key increment could be a reason (you must restart from the next element,
>>> not an upper value since there will be many duplicate keys) 
>> Gah, I completely forgot about duplicate keys. Will fix.
> I'll send you (later today) some splitted patches that will help for
> this. This simplifies review and experimentations on the walk algorithm.
I kept the patch set you created, and have made some minor adjustments
to address the mentioned issues.

I think the only uncertainty I had during implementation was an atomic
operator for retrieving the queue_idx value inside
stream_process_counters. There's no load operator, but I'm not sure if
that is by design, or what the preferred solution is here. In theory if
the value is aligned, the load is atomic anyway and we don't need an
explicit atomic operator. But I'm not sure if that's an assumption we
want to make.

>> The warning is addressable. It means the user should add a `timeout
>> queue` and set it less than 524287ms. This is similar to the "missing
>> timeouts for frontend/backend" warning we print. Though I think it would
>> make sense to add "queue" to that existing warning instead, as it
>> already mentions the timeouts for client, connect & server.
> In fact the problem I'm having is that I've already seen some very high
> timeout values in field (eg on a print server), which means that some
> people do need to have a large value here. For sure they don't need to
> reach 24 days but 8min will definitely be too short for certain extreme
> cases. And I understand the impact of such large values with the queue,
> I just don't know how we can address this at the moment, it's what I'm
> still thinking about.
>
>>> We need to find a more suitable name for this "cntdepend" as it really 
>>> doesn't
>>> tell me that it has anything to do with what it says. I don't have anything
>>> to propose for now I'm afraid.
>> Yeah, not fond of the name, but it was something, and is easy to change.
>> The name originated from "cnt"=counter, and "pend" to be consistent with
>> the naming of "nbpend" and "totpend" right above it. So: counter of
>> de-queued pending connections.
> OK. In fact given that it's a wrapping counter it's why I considered it
> could be an index for the srv and px parts, but the stream part stores
> a distance between the recorded index and the current index.
>
>>> @@ -2467,6 +2469,10 @@ struct task *process_stream(struct task *t, void 
>>> *context, unsigned short state)
>>> return t; /* nothing more to do */
>>> }
>>>  
>>> +   // remove from pending queue here so we update counters
>>> +   if (s->pend_pos)
>>> +   pendconn_free(s->pend_pos);
>>> +
>>> if (s->flags & SF_BE_ASSIGNED)
>>> HA_ATOMIC_SUB(>be->beconn, 1);
>>>
>>> This part is not supposed to be needed since it's already performed
>>> in stream_free() a few lines later. Or if it's required to get the
>>> queue values correctly before logging, then something similar will
>>> be required at every place where we log after trying to connect to
>>> a server (there are a few places depending on the log format and
>>> the logasap option).
>> Yes, it was put there to update the counters before logging. Re-thinking
>> this, updating of the counters needs to be moved into
>> pendconn_process_next_stream anyway. Since this function updates
>> px->cntdepend and is called in a loop, calculating logs.prx_queue_pos
>> after this loop completes will yield incorrect values.
> I thought about the same but was not yet completely certain. I suspect in
> fact the stream's value should indeed be updated upon dequeue, and that
> only the error case has to be taken into account before logging (if there
> was no dequeue). In this case I think we can have this one performed here
> provided we ensure other log places will not be triggered on error.
>
>>> Now I think I understand how the cntdepend works : isn't it simply a
>>> wrapping queue index that you use to measure the difference between
>>> when you queued and when you dequeued ? In this case, wouldn't something
>>> like "queue_idx" be more explicit ? At least it becomes more obvious when
>>> doing the measure that we measure a length by subtracting two indexes.
>> It is a measure of a difference yes, but it's the difference between how
>> many items were processed off the queue. I personally wouldn't call it
>> an index as index implies position.
> I personally see it as a position. Well exactly a sequence number in fact :-)
If you are waiting in the line at the cinema, and when you get in the
queue you're given a number of how many people the cinema has served
since it opened, that's not your position in the queue. Just as when
you're served, and are given a new number of how many 

Re: haproxy-1.8.8 seamless reloads failing with abns@ sockets

2018-06-06 Thread Olivier Houchard
Hi Jarno,

On Sat, May 12, 2018 at 06:04:10PM +0300, Jarno Huuskonen wrote:
> Hi,
> 
> I'm testing 1.8.8(1.8.8-52ec357 snapshot) and seamless reloads
> (expose-fd listeners).
> 
> I'm testing with this config (missing some default timeouts):
> --8<
> global
> stats socket /tmp/stats level admin expose-fd listeners
> 
> defaults
> mode http
> log global
> option httplog
> retries 2
> timeout connect 1500ms
> timeout client  10s
> timeout server  10s
> 
> listen testme
> bind ipv4@127.0.0.1:8080
> server test_abns_server abns@wpproc1 send-proxy-v2
> 
> frontend test_abns
> bind abns@wpproc1 accept-proxy
> http-request deny deny_status 200
> --8<
> 
> Reloads (kill -USR2 $(cat /tmp/haproxy.pid)) are failing:
> "Starting frontend test_abns: cannot listen to socket []"
> (And request to 127.0.0.1:8080 timeout).
> 
> I guess the problem is that on reload, haproxy is trying
> to bind the abns socket again, because (proto_uxst.c) uxst_bind_listener /
> uxst_find_compatible_fd doesn't find existing (the one copied over from
> old process) file descriptor for this abns socket.
> 
> Is uxst_find_compatible_fd only looking for .X.tmp sockets
> and ignoring abns sockets where path starts with \0 ?
> 
> Using unix socket instead of abns socket makes the reload work.
> 

Sorry for the late answer.

You're right indeed, that code was not written with abns sockets in mind.
The attached patch should fix it. It was created from master, but should
apply to 1.8 as well.

Thanks !

Olivier
>From 3ba0fbb7c9e854aafb8a6b98482ad7d23bbb414d Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 6 Jun 2018 18:34:34 +0200
Subject: [PATCH] MINOR: unix: Make sure we can transfer abns sockets as well
 on seamless reload.

When checking if a socket we got from the parent is suitable for a listener,
we just checked that the path matched sockname.tmp, however this is
unsuitable for abns sockets, where we don't have to create a temporary
file and rename it later.
To detect that, check that the first character of the sun_path is 0 for
both, and if so, that _path[1] is the same too.

This should be backported to 1.8.
---
 src/proto_uxst.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/proto_uxst.c b/src/proto_uxst.c
index 9fc50dff4..a1da337fe 100644
--- a/src/proto_uxst.c
+++ b/src/proto_uxst.c
@@ -146,7 +146,12 @@ static int uxst_find_compatible_fd(struct listener *l)
after_sockname++;
if (!strcmp(after_sockname, ".tmp"))
break;
-   }
+   /* abns sockets sun_path starts with a \0 */
+   } else if (un1->sun_path[0] == 0
+   && un2->sun_path[0] == 0
+   && !strncmp(>sun_path[1], >sun_path[1],
+   sizeof(un1->sun_path) - 1))
+   break;
}
xfer_sock = xfer_sock->next;
}
-- 
2.14.3



Re: Observations about reloads and DNS SRV records

2018-06-06 Thread Tait Clarridge
On Wed, Jun 6, 2018 at 11:35 AM Aleksandar Lazic  wrote:

> Hi Tait.
>
> On 06/06/2018 11:16, Tait Clarridge wrote:
> >I've been testing DNS service discovery and the use of SRV records and
> have
> >a few thoughts on a couple things that I noticed.
>
> In this area was a lot of changes in the last version of haproxy, do you
> have
> tested the setup with the latest haproxy 1.8 release?
>
> Please can you be so kind and send us the output of
>
> haproxy -vv
>
>
Hey Aleksandar,

I'm on 1.8.8 so I will try to give it a shot with 1.8.9 later this
afternoon. I've added the output below.

HA-Proxy version 1.8.8 2018/04/19
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -fno-strict-overflow -Wno-format-truncation -Wno-null-dereference
-Wno-unused-label
  OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_ZLIB=1
USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.0h-fips  27 Mar 2018
Running on OpenSSL version : OpenSSL 1.1.0g-fips  2 Nov 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.41 2017-07-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

Thanks,
Tait


Re: Observations about reloads and DNS SRV records

2018-06-06 Thread Aleksandar Lazic

Hi Tait.

On 06/06/2018 11:16, Tait Clarridge wrote:

I've been testing DNS service discovery and the use of SRV records and have
a few thoughts on a couple things that I noticed.


In this area was a lot of changes in the last version of haproxy, do you have
tested the setup with the latest haproxy 1.8 release?

Please can you be so kind and send us the output of

haproxy -vv


1. Reloading with SRV records ignores server-state-file
   - While this is not a huge deal, it does mean that the backend in
question becomes unavailable when the proxy is reloaded until the SRV and
subsequent A records are resolved
   - I understand that the point of these records is to dynamically build
the backend, and populating servers from outdated and potentially invalid
information is not ideal, but it might be nice to seed the backend before
the first SRV query returns
   - Below is the related config since it is very likely that I have
misconfigured something :)

globals
   ...
   server-state-base /var/lib/haproxy
   server-state-file state

defaults
   ...
   load-server-state-from-file global
   default-server init-addr last,libc,none

...

resolvers sd
   nameserver sd 127.0.0.1:

backend dynamic
   server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd
resolve-prefer ipv4 check

2. Additional record responses from the nameserver are not parsed
   - This just means that any servers that are populated from the SRV
records require a second round of querying for each of the hosts after the
fqdn is stored. It might be more efficient if these records are also parsed
but I can see that it might be pretty challenging in the current DNS
resolver
   - Only reason I thought of this was to try and reduce up the time it
takes to populate the backend servers with addresses in an effort to lessen
the effects of #1

I'm happy with the workaround I'll be pursing for now where my SD service
(that originally was going to be a resolver and populate via SRV records)
is going to write all the backend definitions to disk so this is not a
pressing issue, just thought I'd share the limitations I discovered. My
knowledge of C (and the internal workings of HAproxy) is not great
otherwise this would probably be a patch submission for #1 :)

Tait




Observations about reloads and DNS SRV records

2018-06-06 Thread Tait Clarridge
I've been testing DNS service discovery and the use of SRV records and have
a few thoughts on a couple things that I noticed.

1. Reloading with SRV records ignores server-state-file
- While this is not a huge deal, it does mean that the backend in
question becomes unavailable when the proxy is reloaded until the SRV and
subsequent A records are resolved
- I understand that the point of these records is to dynamically build
the backend, and populating servers from outdated and potentially invalid
information is not ideal, but it might be nice to seed the backend before
the first SRV query returns
- Below is the related config since it is very likely that I have
misconfigured something :)

globals
...
server-state-base /var/lib/haproxy
server-state-file state

defaults
...
load-server-state-from-file global
default-server init-addr last,libc,none

...

resolvers sd
nameserver sd 127.0.0.1:

backend dynamic
server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd
resolve-prefer ipv4 check

2. Additional record responses from the nameserver are not parsed
- This just means that any servers that are populated from the SRV
records require a second round of querying for each of the hosts after the
fqdn is stored. It might be more efficient if these records are also parsed
but I can see that it might be pretty challenging in the current DNS
resolver
- Only reason I thought of this was to try and reduce up the time it
takes to populate the backend servers with addresses in an effort to lessen
the effects of #1

I'm happy with the workaround I'll be pursing for now where my SD service
(that originally was going to be a resolver and populate via SRV records)
is going to write all the backend definitions to disk so this is not a
pressing issue, just thought I'd share the limitations I discovered. My
knowledge of C (and the internal workings of HAproxy) is not great
otherwise this would probably be a patch submission for #1 :)

Tait


Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Willy Tarreau
On Wed, Jun 06, 2018 at 04:22:22PM +0200, Olivier Houchard wrote:
> The last patch depended on the first one, so without it that failure is
> expected.

and confirms the benefit of catching such cases at build time :-)

> Thanks a lot for reporting and testing.
> 
> Willy, I think you can push both the patches.

OK now merged, thank you!

Willy



Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Olivier Houchard
On Wed, Jun 06, 2018 at 10:06:30AM -0400, Patrick Hemmer wrote:
> 
> 
> On 2018/6/6 08:24, Olivier Houchard wrote:
> > Hi Willy,
> >
> > On Wed, Jun 06, 2018 at 02:09:01PM +0200, Willy Tarreau wrote:
> >> On Wed, Jun 06, 2018 at 02:04:35PM +0200, Olivier Houchard wrote:
> >>> When building without threads enabled, instead of just using the global
> >>> runqueue, just use the local runqueue associated with the only thread, as
> >>> that's what is now expected for a single thread in 
> >>> prcoess_runnable_tasks().
> >> Just out of curiosity, shouldn't we #ifdef out the global runqueue
> >> definition when running without threads in order to catch such cases
> >> in the future ?
> >>
> > I think this is actually a good idea.
> > My only concern is it adds quite a bit of #ifdef USE_THREAD, see the 
> > attached
> > patch.
> >
> > Regards,
> >
> > Olivier
> With this patch I'm getting:
> 
> include/proto/task.h:138:26: error: use of undeclared identifier 'rqueue'
> 
> With the previous patch, both reported issues are resolved.
> 

Hi Patrick,

The last patch depended on the first one, so without it that failure is
expected.

Thanks a lot for reporting and testing.

Willy, I think you can push both the patches.

Regards,

Olivier



Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Patrick Hemmer


On 2018/6/6 08:24, Olivier Houchard wrote:
> Hi Willy,
>
> On Wed, Jun 06, 2018 at 02:09:01PM +0200, Willy Tarreau wrote:
>> On Wed, Jun 06, 2018 at 02:04:35PM +0200, Olivier Houchard wrote:
>>> When building without threads enabled, instead of just using the global
>>> runqueue, just use the local runqueue associated with the only thread, as
>>> that's what is now expected for a single thread in prcoess_runnable_tasks().
>> Just out of curiosity, shouldn't we #ifdef out the global runqueue
>> definition when running without threads in order to catch such cases
>> in the future ?
>>
> I think this is actually a good idea.
> My only concern is it adds quite a bit of #ifdef USE_THREAD, see the attached
> patch.
>
> Regards,
>
> Olivier
With this patch I'm getting:

include/proto/task.h:138:26: error: use of undeclared identifier 'rqueue'

With the previous patch, both reported issues are resolved.

-Patrick


Your competitors already know this

2018-06-06 Thread Robin Jackson
Hello,



Hope you are doing great.



During our standard search procedure, we came across your website, which is
ranking for some of the most potential keywords (products). But we feel it
is unfortunate that despite having a nicely built and user-friendly
website, yours is still very far from the 1st page of Google’s organic
ranking.

Your loss is your competitor’s gain i.e. the traffic which could have
generated quality sales for you goes to your competitors, as they rank well
in the Search Engine Result Pages (SERPs) organically.



Hope you are using some sort of promotional activities for ranking your
website. But we sincerely feel that the efforts are not up to the desired
level. We are one of the most trusted service providers offering SEO
(Search Engine Optimization) services globally. The target is to make your
website appear on the first page of Google search within minimal time and
with a cost-effective promotion.



We adopt:



•The latest guidelines of Google for significant and long-lasting
results.


•Best techniques to bring your website to the 1st page of Google.


•Creative Digital Marketing campaigns to establish your business
brand.


•Unique Social Media promotions to expand the customer base.


•Round-the-clock customer support team for assistance anytime.


Would you prefer to know the modalities and promotion strategies we adopt
for marketing your website? We guarantee you visible results from the very
1st month. Just give it a try! No minimum contract! No obligation!
Absolutely no compulsion!



Revert back with your query and we will respond you promptly to offer you a
sustainable solution for online marketing.



We will always welcome your thoughts and comments.

With Warm Regards,
Robin Jackson | Online Marketing Consultant
Email: ro...@webprotop.com
Skype: seo.seophalanx
Phone: +91 8917297372


Note:- If you are interested then my Sales Manager will come back to you
with an affordable SEO & Digital Marketing plan which contains our
services,  client reference, price list etc.
[image: beacon]


Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Willy Tarreau
On Wed, Jun 06, 2018 at 02:24:29PM +0200, Olivier Houchard wrote:
> > Just out of curiosity, shouldn't we #ifdef out the global runqueue
> > definition when running without threads in order to catch such cases
> > in the future ?
> > 
> 
> I think this is actually a good idea.
> My only concern is it adds quite a bit of #ifdef USE_THREAD, see the attached
> patch.

Indeed. A few of them may be replaced with the magical __decl_hathread()
macro but that will remain marginal. Well, it's not that bad either, just
let me know what you prefer, I'm fine with both options.

Willy



Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Olivier Houchard
Hi Willy,

On Wed, Jun 06, 2018 at 02:09:01PM +0200, Willy Tarreau wrote:
> On Wed, Jun 06, 2018 at 02:04:35PM +0200, Olivier Houchard wrote:
> > When building without threads enabled, instead of just using the global
> > runqueue, just use the local runqueue associated with the only thread, as
> > that's what is now expected for a single thread in prcoess_runnable_tasks().
> 
> Just out of curiosity, shouldn't we #ifdef out the global runqueue
> definition when running without threads in order to catch such cases
> in the future ?
> 

I think this is actually a good idea.
My only concern is it adds quite a bit of #ifdef USE_THREAD, see the attached
patch.

Regards,

Olivier
>From baeb750ed13307010bfef39de92ec9bb8af54022 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 6 Jun 2018 14:22:03 +0200
Subject: [PATCH] MINOR: tasks: Don't define rqueue if we're building without
 threads.

To make sure we don't inadvertently insert task in the global runqueue,
while only the local runqueue is used without threads, make its definition
and usage conditional on USE_THREAD.
---
 include/proto/task.h |  2 ++
 src/task.c   | 28 +---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/proto/task.h b/include/proto/task.h
index dc8a54481..266246098 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -93,7 +93,9 @@ extern struct pool_head *pool_head_tasklet;
 extern struct pool_head *pool_head_notification;
 extern THREAD_LOCAL struct task *curr_task; /* task currently running or NULL 
*/
 extern THREAD_LOCAL struct eb32sc_node *rq_next; /* Next task to be 
potentially run */
+#ifdef USE_THREAD
 extern struct eb_root rqueue;  /* tree constituting the run queue */
+#endif
 extern struct eb_root rqueue_local[MAX_THREADS]; /* tree constituting the 
per-thread run queue */
 extern struct list task_list[MAX_THREADS]; /* List of tasks to be run, mixing 
tasks and tasklets */
 extern int task_list_size[MAX_THREADS]; /* Number of task sin the task_list */
diff --git a/src/task.c b/src/task.c
index 16c723230..c961725a1 100644
--- a/src/task.c
+++ b/src/task.c
@@ -49,9 +49,11 @@ __decl_hathreads(HA_SPINLOCK_T __attribute__((aligned(64))) 
rq_lock); /* spin lo
 __decl_hathreads(HA_SPINLOCK_T __attribute__((aligned(64))) wq_lock); /* spin 
lock related to wait queue */
 
 static struct eb_root timers;  /* sorted timers tree */
+#ifdef USE_THREAD
 struct eb_root rqueue;  /* tree constituting the run queue */
-struct eb_root rqueue_local[MAX_THREADS]; /* tree constituting the per-thread 
run queue */
 static int global_rqueue_size; /* Number of element sin the global runqueue */
+#endif
+struct eb_root rqueue_local[MAX_THREADS]; /* tree constituting the per-thread 
run queue */
 static int rqueue_size[MAX_THREADS]; /* Number of elements in the per-thread 
run queue */
 static unsigned int rqueue_ticks;  /* insertion count */
 
@@ -68,10 +70,13 @@ void __task_wakeup(struct task *t, struct eb_root *root)
void *expected = NULL;
int *rq_size;
 
+#ifdef USE_THREAD
if (root == ) {
rq_size = _rqueue_size;
HA_SPIN_LOCK(TASK_RQ_LOCK, _lock);
-   } else {
+   } else
+#endif
+   {
int nb = root - _local[0];
rq_size = _size[nb];
}
@@ -80,8 +85,10 @@ void __task_wakeup(struct task *t, struct eb_root *root)
 */
 redo:
if (unlikely(!HA_ATOMIC_CAS(>rq.node.leaf_p, , (void 
*)0x1))) {
+#ifdef USE_THREAD
if (root == )
HA_SPIN_UNLOCK(TASK_RQ_LOCK, _lock);
+#endif
return;
}
/* There's a small race condition, when running a task, the thread
@@ -104,8 +111,10 @@ redo:
state = (volatile unsigned short)(t->state);
if (unlikely(state != 0 && !(state & TASK_RUNNING)))
goto redo;
+#ifdef USE_THREAD
if (root == )
HA_SPIN_UNLOCK(TASK_RQ_LOCK, _lock);
+#endif
return;
}
HA_ATOMIC_ADD(_run_queue, 1);
@@ -124,10 +133,13 @@ redo:
}
 
eb32sc_insert(root, >rq, t->thread_mask);
+#ifdef USE_THREAD
if (root == ) {
global_rqueue_size++;
HA_SPIN_UNLOCK(TASK_RQ_LOCK, _lock);
-   } else {
+   } else
+#endif
+   {
int nb = root - _local[0];
 
rqueue_size[nb]++;
@@ -239,7 +251,9 @@ void process_runnable_tasks()
 {
struct task *t;
int max_processed;
+#ifdef USE_THREAD
uint64_t average = 0;
+#endif
 
tasks_run_queue_cur = tasks_run_queue; /* keep a copy for reporting */
nb_tasks_cur = nb_tasks;
@@ -253,6 +267,7 @@ void process_runnable_tasks()
return;
}
 
+#ifdef USE_THREAD
average = tasks_run_queue / global.nbthread;
 
/* Get some elements from the global run queue and 

Re: HAProxy - Server Timeout and Client Timeout

2018-06-06 Thread Jarno Huuskonen
Hi,

On Tue, Jun 05, Martel, Michael H. wrote:
> We're running HAproxy 1.5.18 on RedHat Enterprise 7.4, as the load balancer 
> for our LMS (Moodle).  We have found that the course backup feature in Moodle 
> will return a 5xx error on some backups.  We have determined that the 
> "timeout server" value needed to be increased.

Do these backup requests have specific urls that you can match with acl ?

If you use separate backend for moodle backups then it should be
possible to increase timeout server for just the backup requests.

Something like
frontend fe_moodle
  acl backup_req path_sub /something/backup
  use_backend moodle_backup if backup_req
  default_backend moodle
...
backend moodle
  timeout server 1m
...

backend moodle_backup
  timeout server 12m
  server moodle1 ... track moodle/moodle1 ...
  server moodle2 ... track moodle/moodle2 ...

> Initially we were using a "timeout client 1m" and "timeout server 1m" .  
> Adjusting the server to "timeout server 12m" fixes the problem and does not 
> appear to introduce any other issues in our testing.
> 
> I can't see any reason that I should have the "timeout client" and the 
> "timeout server" set to the same value.
> 
> Is there anything I should watch out for after increasing the "timeout 
> server" by such a large amount ?

Probably not, but AFAIK if the backend server "dies" after haproxy has
forwarded the request (and before server responds) then client has to
wait for timeout server (in reality I think everyone will just click
stop or reload instead of waiting for the really long timeout).

-Jarno

-- 
Jarno Huuskonen



Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Willy Tarreau
On Wed, Jun 06, 2018 at 02:04:35PM +0200, Olivier Houchard wrote:
> When building without threads enabled, instead of just using the global
> runqueue, just use the local runqueue associated with the only thread, as
> that's what is now expected for a single thread in prcoess_runnable_tasks().

Just out of curiosity, shouldn't we #ifdef out the global runqueue
definition when running without threads in order to catch such cases
in the future ?

Willy



Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Olivier Houchard
Hi Patrick,

On Tue, Jun 05, 2018 at 05:02:41PM -0400, Patrick Hemmer wrote:
> It seems that commit b0bdae7 has completely broken haproxy for me. When
> I send a request to haproxy, it just sits there. The backend server
> receives nothing, and the client waits for a response.
> Running with debug enabled I see just a single line:
> :f1.accept(0004)=0005 from [127.0.0.1:63663] ALPN=
> 
> commit b0bdae7b88d53cf8f18af0deab6d4c29ac25b7f9 (refs/bisect/bad)
> Author: Olivier Houchard 
> Date:   Fri May 18 18:45:28 2018 +0200
> 
> MAJOR: tasks: Introduce tasklets.
>
> Introduce tasklets, lightweight tasks. They have no notion of priority,
> they are just run as soon as possible, and will probably be used for I/O
> later.
>
> For the moment they're used to replace the temporary thread-local list
> that was used in the scheduler. The first part of the struct is common
> with tasks so that tasks can be cast to tasklets and queued in this
> list.
> Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that
> it cannot accidently be confused as not in the queue.
>
> Pure tasklets are identifiable by their nice value of -32768 (which is
> normally not possible).
> 
> Issue reproducible with a very simple config:
> 
> defaults
>   mode http
> frontend f1
>   bind :8081
>   default_backend b1
> backend b1
>   server s1 127.0.0.1:8081
> 
> Compiled on OS-X with only a single make variable of TARGET=osx
> Compiler: clang-900.0.39.2
> 
> 

Oops, seems I broke haproxy when built without thread support.
The attached patch should fix both the issues you reported, can you confirm
it ?

Thanks a lot !

Olivier
>From d3c0abb18b44a942dcc7ead072be84f323184d0f Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 6 Jun 2018 14:01:08 +0200
Subject: [PATCH] BUG/MEDIUM: tasks: Use the local runqueue when building
 without threads.

When building without threads enabled, instead of just using the global
runqueue, just use the local runqueue associated with the only thread, as
that's what is now expected for a single thread in prcoess_runnable_tasks().
This should fix haproxy when built without threads.
---
 include/proto/task.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/proto/task.h b/include/proto/task.h
index 0c2c5f28c..dc8a54481 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -133,7 +133,7 @@ static inline void task_wakeup(struct task *t, unsigned int 
f)
else
root = 
 #else
-   struct eb_root *root = 
+   struct eb_root *root = _local[tid];
 #endif
 
state = HA_ATOMIC_OR(>state, f);
-- 
2.14.3



Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-06 Thread Willy Tarreau
Hi Milan,

On Wed, Jun 06, 2018 at 11:09:19AM +0200, Milan Petruzelka wrote:
> Hi Willy,
> 
> I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
> over last night. It started at 17:19 like this:
> 
> "Packet No.","Time in
> seconds","Source","Destination","Protocol","Length","Info"
> "1","0.00","ip_client","ip_haproxy_server","TCP","62","64311  >
> 443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
> "2","0.001049","ip_haproxy_server","ip_client","TCP","62","443  >
> 64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
> "3","0.127239","ip_client","ip_haproxy_server","TCP","54","64311  >
> 443 [ACK] Seq=1 Ack=1 Win=64240 Len=0"
> "4","0.127344","ip_client","ip_haproxy_server","TLSv1.2","571","Client
> Hello"
> "5","0.130304","ip_haproxy_server","ip_client","TLSv1.2","2974","Server
> Hello, Certificate"
> "6","0.130336","ip_haproxy_server","ip_client","TLSv1.2","310","Server
> Key Exchange, Server Hello Done"
> 
> After some 13 seconds client sent it's last data, which haproxy server
> acknowledged.
> 
> 
> "319","13.781347","ip_client","ip_haproxy_server","TLSv1.2","96","Application
> Data"
> "320","13.781365","ip_haproxy_server","ip_client","TCP","54","443  >
> 64311 [ACK] Seq=240156 Ack=3689 Win=36448 Len=0"
> 
> Then client sent FIN packet, server acknowledged it again
> 
> "321","16.292016","ip_client","ip_haproxy_server","TCP","54","64311  >
> 443 [FIN, ACK] Seq=3689 Ack=240156 Win=64240 Len=0"
> "322","16.329574","ip_haproxy_server","ip_client","TCP","54","443  >
> 64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0"
> 
> >From then client sent only TCP keepalive every 45s, which server always
> ackonwledged.
> 
> "323","61.443121","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
> "324","61.443216","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
> "325","106.528926","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
> "326","106.529117","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
> ...
> 
> After some 4.5 hours (at 21:51) client sent last keepalive which server
> acknowledged. There were no more packets after that.
> 
> "1043","16284.644240","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
> "1044","16284.644354","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
> "1045","16329.797223","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
> "1046","16329.797274","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
> 
> Next day in the morning at 10:40 I can still see the hanging connection on
> the server:
> 
> netstat -aptn|grep 64311
> tcp  430  0 ip_haproxy_server:443  ip_client:64311
>  CLOSE_WAIT  916/haproxy
> 
> lsof|grep 64311
> haproxy 916  haproxy   40u IPv4  106204553
>   0t0TCP ip_haproxy_server:https->ip_client:64311 (CLOSE_WAIT)
> 
> echo "show fd" | socat - $HASOCK | grep "40 :"
> 40 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x1648d80
> iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
> fe=fe-http mux=H2 mux_ctx=0x15e9460
> 
> I hope this can help in tracking the problem down.

Sure, it's extremely useful. It looks like a shutdown is ignored when
waiting for the H2 preface. The FD is not being polled for reading, so
I tend to conclude that either it was disabled for a reason I still do
not know, or it used to be enabled, the shutdown was received then the
polling was disabled. But that doesn't appear in the connection flags.
So it seems that the transition between the end of handshake and the
start of parsing could be at fault. Maybe we refrain from entering the
H2 state machine because of the early shutdown. I'm going to have a look
in that direction.

Thank you!
Willy



Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-06 Thread Milan Petruželka
Hi Willy,

I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
over last night. It started at 17:19 like this:

"Packet No.","Time in
seconds","Source","Destination","Protocol","Length","Info"
"1","0.00","ip_client","ip_haproxy_server","TCP","62","64311  >
443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
"2","0.001049","ip_haproxy_server","ip_client","TCP","62","443  >
64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
"3","0.127239","ip_client","ip_haproxy_server","TCP","54","64311  >
443 [ACK] Seq=1 Ack=1 Win=64240 Len=0"
"4","0.127344","ip_client","ip_haproxy_server","TLSv1.2","571","Client
Hello"
"5","0.130304","ip_haproxy_server","ip_client","TLSv1.2","2974","Server
Hello, Certificate"
"6","0.130336","ip_haproxy_server","ip_client","TLSv1.2","310","Server
Key Exchange, Server Hello Done"

After some 13 seconds client sent it's last data, which haproxy server
acknowledged.


"319","13.781347","ip_client","ip_haproxy_server","TLSv1.2","96","Application
Data"
"320","13.781365","ip_haproxy_server","ip_client","TCP","54","443  >
64311 [ACK] Seq=240156 Ack=3689 Win=36448 Len=0"

Then client sent FIN packet, server acknowledged it again

"321","16.292016","ip_client","ip_haproxy_server","TCP","54","64311  >
443 [FIN, ACK] Seq=3689 Ack=240156 Win=64240 Len=0"
"322","16.329574","ip_haproxy_server","ip_client","TCP","54","443  >
64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0"

>From then client sent only TCP keepalive every 45s, which server always
ackonwledged.

"323","61.443121","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"324","61.443216","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
"325","106.528926","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"326","106.529117","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
...

After some 4.5 hours (at 21:51) client sent last keepalive which server
acknowledged. There were no more packets after that.

"1043","16284.644240","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"1044","16284.644354","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
"1045","16329.797223","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"1046","16329.797274","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"

Next day in the morning at 10:40 I can still see the hanging connection on
the server:

netstat -aptn|grep 64311
tcp  430  0 ip_haproxy_server:443  ip_client:64311
 CLOSE_WAIT  916/haproxy

lsof|grep 64311
haproxy 916  haproxy   40u IPv4  106204553
  0t0TCP ip_haproxy_server:https->ip_client:64311 (CLOSE_WAIT)

echo "show fd" | socat - $HASOCK | grep "40 :"
40 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x1648d80
iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x15e9460

I hope this can help in tracking the problem down.

Best regards,
Milan