Re: directing requests to a specific server
Perfect! Thanks Tim. So many options in the HAProxy configuration sometimes I get lost in it. > On May 23, 2019, at 12:09 PM, Tim Düsterhus wrote: > > Paul, > > Am 23.05.19 um 20:17 schrieb Paul Lockaby: >> If there is a way that I can direct a request to a specific server in a >> backend rather than duplicating backends with different server lists that >> would be ideal. Is that possible? > > I believe you are searching for use-server: > https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4.2-use-server > > Best regards > Tim Düsterhus
Re: directing requests to a specific server
Paul, Am 23.05.19 um 20:17 schrieb Paul Lockaby: > If there is a way that I can direct a request to a specific server in a > backend rather than duplicating backends with different server lists that > would be ideal. Is that possible? I believe you are searching for use-server: https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4.2-use-server Best regards Tim Düsterhus
directing requests to a specific server
Hello! I have a frontend/backend that looks kind of like below, obviously very simplified. frontend myhost-frontend bind *:443 ssl crt /usr/local/ssl/certs/host.pem mode http log global acl request_monitor_cluster path_beg /monitor/cluster use_backend monitor_cluster if request_monitor_cluster # otherwise send requests to this backend default_backend myhost-backend backend monitor_cluster mode http log global balance source hash-type consistent option httpchk GET /haproxy/alive.txt http-check disable-on-404 server host01 host01.example.com:1234 check server host02 host02.example.com:1234 check server host03 host03.example.com:1234 check The key thing here is that if someone goes to https://myhost.example.com/monitor/cluster they access the monitoring system from some arbitrary host in the cluster. I'd like to be able to do something where if I go to, say, https://myhost.example.com/monitor/node/host01 that I can go to the monitoring system on a specific node. My first thought is that I would have to make three more backends, one for each node in the cluster, and each backend would only have one server in it. If there is a way that I can direct a request to a specific server in a backend rather than duplicating backends with different server lists that would be ideal. Is that possible?
Re: Sticky-table persistence in a Kubernetes environment
Hi Eduardo, On Thu, May 23, 2019 at 10:09:55AM -0300, Eduardo Doria Lima wrote: > Hi Aleks, > > I don't understand what you means with "local host". But could be nice if > new process get data of old process. That's exatly the principle. A peers section contains a number of peers, including the local one. Example, let's say you have 4 haproxy nodes, all of them will have the exact same section : peers my-cluster peer node1 10.0.0.1:1200 peer node2 10.0.0.2:1200 peer node3 10.0.0.3:1200 peer node4 10.0.0.4:1200 When you start haproxy it checks if there is a peer with the same name as the local machine, if so it considers it as the local peer and will try to synchronize the full tables with it. Normally what this means is that the old process connects to the new one to teach it everything. When your peers don't hold the same name, you can force it on the command line using -H to give the local peer name, e.g. "-H node3". Also, be sure to properly reload, not restart! The restart (-st) will kill the old process without leaving it a chance to resychronize! The reload (-sf) will tell it to finish its work then quit, and among its work there's the resync job ;-) > As I said to João Morais, we "solve" this problem adding a sidecar HAProxy > (another container in same pod) only to store the sticky-table of main > HAProxy. In my opinion it's a resource waste, but this is best solution now. That's a shame because the peers naturally support not losing tables on reload, so indeed your solution is way more complex! Hoping this helps, Willy
Re: RFE: insert server into peer section
Hi Aleks, On Thu, May 23, 2019 at 01:12:48PM +0200, Aleksandar Lazic wrote: > We had a interesting discussion on Kubeconf how a session table in peer's can > survive a restart of a haproxy instance. > > We came to a request for enhancement (RFE) to be able to add a peer server, > not a peer section, to a existing peers section, similar to add server for > backend. It's not entirely clear to me in which case it would be useful. Do you need to deploy new haproxy nodes on the fly and to avoid restarting the other ones to see it as a peer maybe ? Thanks to the changes Fred did which made the peers become totally regular servers, I suspect it wouldn't be too hard to make the server-template mechanism work the same way for peers. It's just that I'm not sure about the expected benefits. Cheers, Willy
Re: [ANNOUNCE] haproxy-2.0-dev4
On Thu, May 23, 2019 at 07:35:43PM +0500, ??? wrote: > we can definetly cache "git clone" for BoringSSL, I'll send patch. OK! > as for "build cache", it might be not that trivial. No problem, I'm only suggesting. What matters the most to me is that it works fine and causes little false positives (one in a while is OK). The second goal is to make efficient use of the resources they assign us for free. The third one is that it shows a low latency. The last two goals tend to depend on the same principles :-) Willy
Re: [ANNOUNCE] haproxy-2.0-dev4
чт, 23 мая 2019 г. в 18:45, Willy Tarreau : > On Thu, May 23, 2019 at 04:17:33PM +0500, ??? wrote: > > I'd like to run sanitizers on vaious combinations, like ZLIB / SLZ, PCRE > / > > PCRE2 ... > > ok, let us do it before Wednesday > > OK, why not. Feel free to send patches once you can test them. Please > make sure not to unreasonably increase the build time by multiplying > the build combinations, right now it provides good value because you > get the result while still working on the subject, this is an essential > feature! And I'd even say that the boringssl build is extremely long and > hinders this a little bit, I don't know why it is like this. If we could > also save some resources on their infrastructure by keeping some prebuilt > stuff somewhere, that would be great, but I have no idea whether it's > possible to cache some data (e.g. rebuild the components only once a > day). > we can definetly cache "git clone" for BoringSSL, I'll send patch. as for "build cache", it might be not that trivial. > > Cheers, > Willy >
Re: [ANNOUNCE] haproxy-2.0-dev4
On Thu, May 23, 2019 at 04:17:33PM +0500, ??? wrote: > I'd like to run sanitizers on vaious combinations, like ZLIB / SLZ, PCRE / > PCRE2 ... > ok, let us do it before Wednesday OK, why not. Feel free to send patches once you can test them. Please make sure not to unreasonably increase the build time by multiplying the build combinations, right now it provides good value because you get the result while still working on the subject, this is an essential feature! And I'd even say that the boringssl build is extremely long and hinders this a little bit, I don't know why it is like this. If we could also save some resources on their infrastructure by keeping some prebuilt stuff somewhere, that would be great, but I have no idea whether it's possible to cache some data (e.g. rebuild the components only once a day). Cheers, Willy
Re: Sticky-table persistence in a Kubernetes environment
Hi Aleks, I don't understand what you means with "local host". But could be nice if new process get data of old process. As I said to João Morais, we "solve" this problem adding a sidecar HAProxy (another container in same pod) only to store the sticky-table of main HAProxy. In my opinion it's a resource waste, but this is best solution now. I know João don't have time to implement the peers part now. But I'm trying to make some tests, if successful I can make a pull request. Att, Eduardo Em qui, 23 de mai de 2019 às 09:40, Aleksandar Lazic escreveu: > > Hi Eduardo. > > Thu May 23 14:30:46 GMT+02:00 2019 Eduardo Doria Lima : > > > HI Aleks, > > "First why do you restart all haproxies at the same time and don't use > rolling updates ?" > > We restarts all HAProxys at the same time because they watch Kubernetes > API. The ingress ( https://github.com/jcmoraisjr/haproxy-ingress [ > https://github.com/jcmoraisjr/haproxy-ingress] ) do this automatic. I was > talking with ingress creator João Morais about the possibility of use a > random value to restart but we agree it's not 100% secure to keep the > table. The ingress don't use rolling update because it's fast to realod > HAProxy than kill entire Pod. I think. I will find more about this. > > João, Baptiste and I talked about this topic on the kubeconf here and the > was the suggestion to add the "local host" in the peers section. > When a restart happen then haproxy new process ask haproxy old process to > get the data. > > I don't know when joao have the time to implement the peers part. > > Regards > Aleks > > > "Maybe you can add a init container to update the peers in the current > running haproxy pod's with socket commands, if possible." > > The problem is not update the peers, we can do this. The problem is all > the peers reload at same time. > > "* how often happen such a restart?" > > Not to much, but enough to affect some users when it occurs. > > > > "* how many entries are in the tables?" > > I don't know exactly, maybe between thousand and ten thousand. > > > > Thanks! > > Att, Eduardo > > > > > > > > Em qua, 22 de mai de 2019 às 16:10, Aleksandar Lazic < > al-hapr...@none.at [] > escreveu: > > > >> > >> Hi Eduardo. > >> > >> That's a pretty interesting question, at least for me. > >> > >> First why do you restart all haproxies at the same time and don't use > rolling updates ? > >> > >> > https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/ > [ > https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/ > ] > >> > >> Maybe you can add a init container to update the peers in the current > running haproxy pod's with socket commands, if possible. > >> > >> https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ [ > https://kubernetes.io/docs/concepts/workloads/pods/init-containers/] > >> > >> http://cbonte.github.io/haproxy-dconv/1.9/management.html#9.3 [ > http://cbonte.github.io/haproxy-dconv/1.9/management.html#9.3] > >> > >> Agree with you that peers possibility would be nice. > >> > >> Some other questions are. > >> > >> * how often happen such a restart? > >> * how many entries are in the tables? > >> > >> I don't see anything wrong to use a "quorum" Server. This is a pretty > common solution even on contained setups. > >> > >> Regards > >> Aleks > >> > >> Wed May 22 15:36:10 GMT+02:00 2019 Eduardo Doria Lima < > eduardo.l...@trt20.jus.br [] >: > >> > >>> Hi, > >>> I'm using HAProxy to support a system that was initially developed > for Apache (AJP) and JBoss. Now we are migrating it's infrastructure to a > Kubernetes cluster with HAProxy as ingress (load balancer). > >>> The big problem is this system depends strict to JSESSIONID. Some > internal requests made in Javascript or Angular don't respect browser > cookies and send requests only with original Jboss JSESSIONID value. > >>> Because of this we need a sticky-table to map JSESSIONID values. But > in a cluster environment ( https://github.com/jcmoraisjr/haproxy-ingress [ > https://github.com/jcmoraisjr/haproxy-ingress] ) HAProxy has many > instances and this instances don't have fixed IP, they are volatile. > >>> Also, in Kubernetes cluster everything is in constant change and any > change is a reload of all HAProxy instances. So, we lost the sticky-table. > >>> Even we use "peers" feature as described in this issue ( > https://github.com/jcmoraisjr/haproxy-ingress/issues/296 [ > https://github.com/jcmoraisjr/haproxy-ingress/issues/296] ) by me, we > don't know if table will persist because all instances will reload in the > same time. > >>> We thought to use a separate HAProxy server only to cache this table. > This HAProxy will never reload. But I'm not comfortable to use a HAProxy > server instance only for this. > >>> I appreciate if you help me. Thanks! > >>> > >>> Att, > >>> Eduardo > >>> > >> > > > > >
Re: Haproxy infront of exim cluster - SMTP protocol synchronization error
Hi, On Wed, May 22, Brent Clark wrote: > 2019-05-22 12:23:15 SMTP protocol synchronization error (input sent > without waiting for greeting): rejected connection from > H=smtpgatewayserver [IP_OF_LB_SERVER] input="PROXY TCP4 $MY_IP > $IP_OF_LB_SERVER 39156 587\r\n" Seems like proxy protocol is not enabled on exim. > We use Exim and I set: > hostlist haproxy_hosts = IP.OF.LB Do you have hosts_proxy(https://www.exim.org/exim-html-current/doc/html/spec_html/ch-proxies.html) set/enabled ? -Jarno > My haproxy config: > https://pastebin.com/raw/JYAXkAq4 > > If I run > openssl s_client -host smtpgatewayserver -port 587 -starttls smtp -crlf > > openssl says connected, but SSL-Session is empty. > > I would like to say, if I change 'send-proxy' to 'check', the > everything works, BUT the IP logged by Exim, is that of the LB, and > not the client. > > If anyone could please review the haproxy config / my setup, it > would be appreciated. > > Many thanks > Brent Clark > > -- Jarno Huuskonen
Re: Sticky-table persistence in a Kubernetes environment
Hi Eduardo. Thu May 23 14:30:46 GMT+02:00 2019 Eduardo Doria Lima : > HI Aleks, > "First why do you restart all haproxies at the same time and don't use > rolling updates ?" > We restarts all HAProxys at the same time because they watch Kubernetes API. > The ingress ( https://github.com/jcmoraisjr/haproxy-ingress > [https://github.com/jcmoraisjr/haproxy-ingress] ) do this automatic. I was > talking with ingress creator João Morais about the possibility of use a > random value to restart but we agree it's not 100% secure to keep the table. > The ingress don't use rolling update because it's fast to realod HAProxy > than kill entire Pod. I think. I will find more about this. João, Baptiste and I talked about this topic on the kubeconf here and the was the suggestion to add the "local host" in the peers section. When a restart happen then haproxy new process ask haproxy old process to get the data. I don't know when joao have the time to implement the peers part. Regards Aleks > "Maybe you can add a init container to update the peers in the current > running haproxy pod's with socket commands, if possible." > The problem is not update the peers, we can do this. The problem is all the > peers reload at same time. > "* how often happen such a restart?" > Not to much, but enough to affect some users when it occurs. > > "* how many entries are in the tables?" > I don't know exactly, maybe between thousand and ten thousand. > > Thanks! > Att, Eduardo > > > > Em qua, 22 de mai de 2019 às 16:10, Aleksandar Lazic < al-hapr...@none.at [] > > escreveu: > >> >> Hi Eduardo. >> >> That's a pretty interesting question, at least for me. >> >> First why do you restart all haproxies at the same time and don't use >> rolling updates ? >> >> https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/ >> [https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/] >> >> Maybe you can add a init container to update the peers in the current >> running haproxy pod's with socket commands, if possible. >> >> https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ >> [https://kubernetes.io/docs/concepts/workloads/pods/init-containers/] >> >> http://cbonte.github.io/haproxy-dconv/1.9/management.html#9.3 >> [http://cbonte.github.io/haproxy-dconv/1.9/management.html#9.3] >> >> Agree with you that peers possibility would be nice. >> >> Some other questions are. >> >> * how often happen such a restart? >> * how many entries are in the tables? >> >> I don't see anything wrong to use a "quorum" Server. This is a pretty >> common solution even on contained setups. >> >> Regards >> Aleks >> >> Wed May 22 15:36:10 GMT+02:00 2019 Eduardo Doria Lima < >> eduardo.l...@trt20.jus.br [] >: >> >>> Hi, >>> I'm using HAProxy to support a system that was initially developed for >>> Apache (AJP) and JBoss. Now we are migrating it's infrastructure to a >>> Kubernetes cluster with HAProxy as ingress (load balancer). >>> The big problem is this system depends strict to JSESSIONID. Some internal >>> requests made in Javascript or Angular don't respect browser cookies and >>> send requests only with original Jboss JSESSIONID value. >>> Because of this we need a sticky-table to map JSESSIONID values. But in a >>> cluster environment ( https://github.com/jcmoraisjr/haproxy-ingress >>> [https://github.com/jcmoraisjr/haproxy-ingress] ) HAProxy has many >>> instances and this instances don't have fixed IP, they are volatile. >>> Also, in Kubernetes cluster everything is in constant change and any >>> change is a reload of all HAProxy instances. So, we lost the sticky-table. >>> Even we use "peers" feature as described in this issue ( >>> https://github.com/jcmoraisjr/haproxy-ingress/issues/296 >>> [https://github.com/jcmoraisjr/haproxy-ingress/issues/296] ) by me, we >>> don't know if table will persist because all instances will reload in the >>> same time. >>> We thought to use a separate HAProxy server only to cache this table. This >>> HAProxy will never reload. But I'm not comfortable to use a HAProxy server >>> instance only for this. >>> I appreciate if you help me. Thanks! >>> >>> Att, >>> Eduardo >>> >> >
Re: Sticky-table persistence in a Kubernetes environment
HI Aleks, "First why do you restart all haproxies at the same time and don't use rolling updates ?" We restarts all HAProxys at the same time because they watch Kubernetes API. The ingress (https://github.com/jcmoraisjr/haproxy-ingress) do this automatic. I was talking with ingress creator João Morais about the possibility of use a random value to restart but we agree it's not 100% secure to keep the table. The ingress don't use rolling update because it's fast to realod HAProxy than kill entire Pod. I think. I will find more about this. "Maybe you can add a init container to update the peers in the current running haproxy pod's with socket commands, if possible." The problem is not update the peers, we can do this. The problem is all the peers reload at same time. "* how often happen such a restart?" Not to much, but enough to affect some users when it occurs. "* how many entries are in the tables?" I don't know exactly, maybe between thousand and ten thousand. Thanks! Att, Eduardo Em qua, 22 de mai de 2019 às 16:10, Aleksandar Lazic escreveu: > Hi Eduardo. > > That's a pretty interesting question, at least for me. > > First why do you restart all haproxies at the same time and don't use > rolling updates ? > > https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/ > > Maybe you can add a init container to update the peers in the current > running haproxy pod's with socket commands, if possible. > > https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ > > http://cbonte.github.io/haproxy-dconv/1.9/management.html#9.3 > > Agree with you that peers possibility would be nice. > > Some other questions are. > > * how often happen such a restart? > * how many entries are in the tables? > > I don't see anything wrong to use a "quorum" Server. This is a pretty > common solution even on contained setups. > > Regards > Aleks > > Wed May 22 15:36:10 GMT+02:00 2019 Eduardo Doria Lima < > eduardo.l...@trt20.jus.br>: > > Hi, > > I'm using HAProxy to support a system that was initially developed for > Apache (AJP) and JBoss. Now we are migrating it's infrastructure to a > Kubernetes cluster with HAProxy as ingress (load balancer). > > The big problem is this system depends strict to JSESSIONID. Some internal > requests made in Javascript or Angular don't respect browser cookies and > send requests only with original Jboss JSESSIONID value. > > Because of this we need a sticky-table to map JSESSIONID values. But in a > cluster environment (https://github.com/jcmoraisjr/haproxy-ingress) > HAProxy has many instances and this instances don't have fixed IP, they are > volatile. > > Also, in Kubernetes cluster everything is in constant change and any > change is a reload of all HAProxy instances. So, we lost the sticky-table. > > Even we use "peers" feature as described in this issue ( > https://github.com/jcmoraisjr/haproxy-ingress/issues/296) by me, we don't > know if table will persist because all instances will reload in the same > time. > > We thought to use a separate HAProxy server only to cache this table. This > HAProxy will never reload. But I'm not comfortable to use a HAProxy server > instance only for this. > > I appreciate if you help me. Thanks! > > > Att, > Eduardo > >
Re: [ANNOUNCE] haproxy-2.0-dev4
чт, 23 мая 2019 г. в 01:28, Willy Tarreau : > Hi, > > HAProxy 2.0-dev4 was released on 2019/05/22. It added 83 new commits > after version 2.0-dev3. > > This release completes the integration of a few pending features and > the ongoing necessary cleanups before 2.0. > > A few bugs were addressed in the way to deal with certain connection > errors, but overall there was nothing dramatic, which indicates we're > stabilizing (it has been running flawlessly for 1 week now on > haproxy.org). > > There are a few new features that were already planned. One is the support > of event ports as an alternate (read "faster") polling method on Solaris, > by Manu. Another one is the replacement of the slow stream processing by > a better and more reliable watchdog. It currently only supports Linux > however, but a FreeBSD port seems reasonably easy to do. It will detect > inter-thread deadlocks as well as tasks stuck looping in an endless list > which has been corrupted, and will provoke a panic, dumping all threads > states, then doing an abort (in hope to get a core). This will allow the > problem to be immediately detected and even the service to be automatically > restarted when the service manager supports it. It's also possible to > consult all threads' states on the CLI using "show threads". > > As previously discussed we have also deprecated the very old req* and rsp* > directives with warnings suggesting what to use instead. They still work > but the goal is to kill them in 2.1, so there's no rush to convert your > configs given that 2.0 is LTS but you will be encouraged to progressively > adapt your future configs. Likewise "option forceclose" now warns and > "resolution_pool_size" is an error (it never existed in any release). > > WURFL is now HTX-aware. There are some new developer-friendly commands > on the CLI when built with -DDEBUG_DEV, they allow to inspect memory > areas or send signals, which is convenient during development. It should > have been done earlier! > > Cirrus-CI is enabled to test builds on FreeBSD. To be honest at this point > it's still not completely clear to me how to fully use it as their > interface > is a bit limited but it has the merit of existing. It doesn't build as > often > as Travis-CI, and it decided to build the last fix after I tagged this > release, showing that apparently there's still a build error on FreeBSD, > that I don't understand for now. > > Lots of code cleanups were done, and some old build options were refreshed > to match their equivalent makefile option. > > Overall, aside the possible occasional build issues here and there, it's > expected to be a bit more stable than dev3, which I'm currently already > satisfied with. > > Let's set on -dev5 around next Wednesday with the final polishing. > Depending > on the amount of issues we'll be able to decide on a release date. > I'd like to run sanitizers on vaious combinations, like ZLIB / SLZ, PCRE / PCRE2 ... ok, let us do it before Wednesday > > Please find the usual URLs below : >Site index : http://www.haproxy.org/ >Discourse: http://discourse.haproxy.org/ >Slack channel: https://slack.haproxy.org/ >Issue tracker: https://github.com/haproxy/haproxy/issues >Sources : http://www.haproxy.org/download/2.0/src/ >Git repository : http://git.haproxy.org/git/haproxy.git/ >Git Web browsing : http://git.haproxy.org/?p=haproxy.git >Changelog: http://www.haproxy.org/download/2.0/src/CHANGELOG >Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ > > Willy > --- > Complete changelog : > Bertrand Jacquin (1): > DOC: fix "successful" typo > > Christopher Faulet (1): > BUG/MINOR: http_fetch: Rely on the smp direction for "cookie()" and > "hdr()" > > Emmanuel Hocdet (3): > BUILD: makefile: use USE_OBSOLETE_LINKER for solaris > BUILD: makefile: remove -fomit-frame-pointer optimisation (solaris) > MAJOR: polling: add event ports support (Solaris) > > Ilya Shipitsin (2): > BUILD: enable freebsd builds on cirrus-ci > BUILD: travis: add sanitizers to travis-ci builds > > Olivier Houchard (3): > BUG/MEDIUM: streams: Don't use CF_EOI to decide if the request is > complete. > BUG/MEDIUM: streams: Try to L7 retry before aborting the connection. > BUG/MEDIUM: streams: Don't switch from SI_ST_CON to SI_ST_DIS on > read0. > > Tim Duesterhus (3): > MEDIUM: Make 'option forceclose' actually warn > MEDIUM: Make 'resolution_pool_size' directive fatal > BUG/MINOR: mworker: Fix memory leak of mworker_proc members > > William Lallemand (1): > MINOR: init: setenv HAPROXY_CFGFILES > > Willy Tarreau (61): > DOC: management: place "show activity" at the right place > MINOR: cli/activity: show the dumping thread ID starting at 1 > MINOR: task: export global_task_mask > MINOR: cli/debug: add a thread dump function > BUG/MINOR: debug: make ha_task_dump()
RFE: insert server into peer section
Hi. We had a interesting discussion on Kubeconf how a session table in peer's can survive a restart of a haproxy instance. We came to a request for enhancement (RFE) to be able to add a peer server, not a peer section, to a existing peers section, similar to add server for backend. Opinions? Regards Aleks
Re: cirrus-ci is red
чт, 23 мая 2019 г. в 14:03, Willy Tarreau : > On Wed, May 22, 2019 at 07:15:21PM +0200, Willy Tarreau wrote: > > On Wed, May 22, 2019 at 03:24:25PM +0500, ??? wrote: > > > Hello, > > > > > > someone is reviewing this > https://github.com/haproxy/haproxy/runs/133866993 > > > ? > > > > So apparently we don't have _POSIX_C_SOURCE >= 199309L there, which > > contradicts the promise in my linux man pages :-/ The docs on opengroup > > do not mention this define but indicate that the extension was derived > > from another spec. I'm going to remove the version test, as I think that > > the POSIX_TIMERS will not be set anyway in this case. We'll see if it > > breaks anywhere else. > > I could build haproxy on a FreeBSD 11.1 machine. I had other issues to > fix, but I didn't get the error on clock_gettime(). I'm not sure how > it can happen since the defines would need to be incompatible between > two files. I'm not even certain it's running on the last source, I can't > find how to navigate between the builds on their interface. > I'll have a look. probably we'll add freebsd 11 to cirrus matrix > > For now I've addressed the issues which will lead to a failure with > timerfd_{create,settime,delete}() that I detected here in my VM. We'll > see if Cirrus =' status changes. Once it works we could try again with > USE_RT on FreeBSD, but well, one thing at a time :-) > > Cheers, > Willy >
Re: cirrus-ci is red
On Wed, May 22, 2019 at 07:15:21PM +0200, Willy Tarreau wrote: > On Wed, May 22, 2019 at 03:24:25PM +0500, ??? wrote: > > Hello, > > > > someone is reviewing this https://github.com/haproxy/haproxy/runs/133866993 > > ? > > So apparently we don't have _POSIX_C_SOURCE >= 199309L there, which > contradicts the promise in my linux man pages :-/ The docs on opengroup > do not mention this define but indicate that the extension was derived > from another spec. I'm going to remove the version test, as I think that > the POSIX_TIMERS will not be set anyway in this case. We'll see if it > breaks anywhere else. I could build haproxy on a FreeBSD 11.1 machine. I had other issues to fix, but I didn't get the error on clock_gettime(). I'm not sure how it can happen since the defines would need to be incompatible between two files. I'm not even certain it's running on the last source, I can't find how to navigate between the builds on their interface. For now I've addressed the issues which will lead to a failure with timerfd_{create,settime,delete}() that I detected here in my VM. We'll see if Cirrus =' status changes. Once it works we could try again with USE_RT on FreeBSD, but well, one thing at a time :-) Cheers, Willy
Re: SD-termination cause
Hi Maksim, On Thu, May 23, 2019 at 10:00:19AM +0300, ?? ? wrote: > 2nd session (from haproxy to ssl-enabled backend A, dumped with tshark for > better readability): > 1 09:10:48.222518 HAPROXY -> BACKEND_A TCP 94 36568 -> 9790 [SYN] Seq=0 > Win=26520 Len=0 MSS=8840 SACK_PERM=1 TSval=3064071282 TSecr=0 WS=2048 > 2 09:10:48.222624 BACKEND_A -> HAPROXY TCP 94 9790 -> 36568 [SYN, ACK] Seq=0 > Ack=1 Win=26784 Len=0 MSS=8940 SACK_PERM=1 TSval=3366865490 > TSecr=3064071282 WS=256 > 3 09:10:48.222639 HAPROXY -> BACKEND_A TCP 86 36568 -> 9790 [ACK] Seq=1 Ack=1 > Win=26624 Len=0 TSval=3064071283 TSecr=3366865490 > 4 09:10:48.222658 HAPROXY -> BACKEND_A TLSv1 603 Client Hello > 5 09:10:48.222741 BACKEND_A -> HAPROXY TCP 86 9790 -> 36568 [ACK] Seq=1 > Ack=518 Win=27904 Len=0 TSval=3366865490 TSecr=3064071283 > 6 09:10:48.272165 HAPROXY -> BACKEND_A TCP 86 36568 -> 9790 [RST, ACK] > Seq=518 Ack=1 Win=26624 Len=0 TSval=3064071332 TSecr=3366865490 > > Backend didn't answer with Server Hello in 49.5ms after tcp-handshed has > finished for some reason. That is the root case of the error!!! Indeed, this is an interesting case. I'm not sure why it's reported like this but it definitely is a corner case as the L4 connection is established and the handshake was aborted. It should have been reported as "sC" (timeout during connect). But I can easily understand how we can make some wrong assumptions based on the available elements when reporting an error (e.g. the connection is valid, only the handshake is incomplete). We definitely need to figure what's happening before releasing 2.0, as it could indicate a bigger issue in the connection setup error path. By the way I'm thinking that for 2.1 we should probably think about reporting a separate step for the server-side handshake, but that's another story. > The last session (from haproxy to plain-http backend B): > 1 09:10:48.272235 HAPROXY -> BACKEND_B TCP 94 33532 -> 9791 [SYN] Seq=0 > Win=26520 Len=0 MSS=8840 SACK_PERM=1 TSval=561683483 TSecr=0 WS=2048 > 2 09:10:48.272358 BACKEND_B -> HAPROXY TCP 94 9791 -> 33532 [SYN, ACK] Seq=0 > Ack=1 Win=26784 Len=0 MSS=8940 SACK_PERM=1 TSval=874005989 TSecr=561683483 > WS=256 > 3 09:10:48.272369 HAPROXY -> BACKEND_B TCP 86 33532 -> 9791 [ACK] Seq=1 Ack=1 > Win=26624 Len=0 TSval=561683483 TSecr=874005989 > 4 09:10:48.272396 HAPROXY -> BACKEND_B HTTP 3590 GET /xx/xx/xxx > HTTP/1.1 > 5 09:10:48.272448 HAPROXY -> BACKEND_B TCP 86 33532 -> 9791 [FIN, ACK] > Seq=3505 Ack=1 Win=26624 Len=0 TSval=561683483 TSecr=874005989 > 6 09:10:48.272529 BACKEND_B -> HAPROXY TCP 86 9791 -> 33532 [ACK] Seq=1 > Ack=3505 Win=33792 Len=0 TSval=874005989 TSecr=561683483 > 7 09:10:48.272729 BACKEND_B -> HAPROXY TCP 86 9791 -> 33532 [FIN, ACK] Seq=1 > Ack=3506 Win=33792 Len=0 TSval=874005989 TSecr=561683483 > 8 09:10:48.272736 HAPROXY -> BACKEND_B TCP 86 33532 -> 9791 [ACK] Seq=3506 > Ack=2 Win=26624 Len=0 TSval=561683484 TSecr=874005989 > > As you can see, haproxy instance made another try to establish connection > and it did succeed but 50ms are over, and FIN was send right after > GET-request. This should never happen either, or you may quickly run out of source ports by having your ports in TIME_WAIT state :-( > Conclusion: > * Haproxy does not respond with 502 in case of timing out on ssl-connection > establishing to backends So for this case since it's a timeout, it should be a 504. > * Seems strange to me that connection timer was not reset after the first > unsuccessfull connection ("retries 1" was set) Indeed you're right, that might be the reason for the FIN just after the GET. > * SD-status of error is confusing :) I suspect there are in fact 2 or 3 issues in the outgoing connection code that result in all of this. This code is very complex since it has to deal with reuse, server pools and redispatch at the same time. We need to have a look into this. I'll wait for Olivier's availability since he knows this area better (especially the reuse stuff that I would break just by approaching it). Many thanks for your detailed traces and analysis, this is very informative! Willy
Re: SD-termination cause
Hi, Willy! This kind of errors only happen on proxy-sections with ssl-enabled backends ('ssl verify none' in server lines). In order to find out what realy happens from network point of view I added one plain-http backend to one of the proxy-sections. Then I captured the sutuation when request failed on this plain-http backend. interesting parameters from config: timeout connect 50 timeout queue 1s retries 1 servers lines look like this: default-server weight 50 on-error fastinter server BACKEND_A:9790 10.10.10.10:9790 weight 100 check ssl verify none observe layer7 server BACKEND_B:9791 10.10.10.11:9791 weight 100 check observe layer7 Now I'll show you 3 tcp-sessions, I've captured: The first session (from client to haproxy instance): 1 09:10:48.222378 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [S], seq 3359830899, win 43690, options [mss 65476,sackOK,TS val 3131804957 ecr 0,nop,wscale 11], length 0 2 09:10:48.222388 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [S.], seq 1294278968, ack 3359830900, win 43690, options [mss 65476,sackOK,TS val 3131804957 ecr 3131804957,nop,wscale 11], length 0 3 09:10:48.222397 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [.], ack 1, win 22, options [nop,nop,TS val 3131804957 ecr 3131804957], length 0 4 09:10:48.222449 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [P.], seq 1:3505, ack 1, win 22, options [nop,nop,TS val 3131804957 ecr 3131804957], length 3504 5 09:10:48.222458 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [.], ack 3505, win 86, options [nop,nop,TS val 3131804957 ecr 3131804957], length 0 6 09:10:48.272790 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [F.], seq 1, ack 3505, win 86, options [nop,nop,TS val 3131805008 ecr 3131804957], length 0 7 09:10:48.272836 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [F.], seq 3505, ack 2, win 22, options [nop,nop,TS val 3131805008 ecr 3131805008], length 0 8 09:10:48.272844 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [.], ack 3506, win 86, options [nop,nop,TS val 3131805008 ecr 3131805008], length 0 As you can see the client sent request to the haproxy instance (packet #4). The instance acknoledged it (packet #5). And then 50.332ms after haproxy answered with FIN with no data (packet #6, "length 0"). 2nd session (from haproxy to ssl-enabled backend A, dumped with tshark for better readability): 1 09:10:48.222518 HAPROXY → BACKEND_A TCP 94 36568 → 9790 [SYN] Seq=0 Win=26520 Len=0 MSS=8840 SACK_PERM=1 TSval=3064071282 TSecr=0 WS=2048 2 09:10:48.222624 BACKEND_A → HAPROXY TCP 94 9790 → 36568 [SYN, ACK] Seq=0 Ack=1 Win=26784 Len=0 MSS=8940 SACK_PERM=1 TSval=3366865490 TSecr=3064071282 WS=256 3 09:10:48.222639 HAPROXY → BACKEND_A TCP 86 36568 → 9790 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=3064071283 TSecr=3366865490 4 09:10:48.222658 HAPROXY → BACKEND_A TLSv1 603 Client Hello 5 09:10:48.222741 BACKEND_A → HAPROXY TCP 86 9790 → 36568 [ACK] Seq=1 Ack=518 Win=27904 Len=0 TSval=3366865490 TSecr=3064071283 6 09:10:48.272165 HAPROXY → BACKEND_A TCP 86 36568 → 9790 [RST, ACK] Seq=518 Ack=1 Win=26624 Len=0 TSval=3064071332 TSecr=3366865490 Backend didn't answer with Server Hello in 49.5ms after tcp-handshed has finished for some reason. That is the root case of the error!!! The last session (from haproxy to plain-http backend B): 1 09:10:48.272235 HAPROXY → BACKEND_B TCP 94 33532 → 9791 [SYN] Seq=0 Win=26520 Len=0 MSS=8840 SACK_PERM=1 TSval=561683483 TSecr=0 WS=2048 2 09:10:48.272358 BACKEND_B → HAPROXY TCP 94 9791 → 33532 [SYN, ACK] Seq=0 Ack=1 Win=26784 Len=0 MSS=8940 SACK_PERM=1 TSval=874005989 TSecr=561683483 WS=256 3 09:10:48.272369 HAPROXY → BACKEND_B TCP 86 33532 → 9791 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=561683483 TSecr=874005989 4 09:10:48.272396 HAPROXY → BACKEND_B HTTP 3590 GET /xx/xx/xxx HTTP/1.1 5 09:10:48.272448 HAPROXY → BACKEND_B TCP 86 33532 → 9791 [FIN, ACK] Seq=3505 Ack=1 Win=26624 Len=0 TSval=561683483 TSecr=874005989 6 09:10:48.272529 BACKEND_B → HAPROXY TCP 86 9791 → 33532 [ACK] Seq=1 Ack=3505 Win=33792 Len=0 TSval=874005989 TSecr=561683483 7 09:10:48.272729 BACKEND_B → HAPROXY TCP 86 9791 → 33532 [FIN, ACK] Seq=1 Ack=3506 Win=33792 Len=0 TSval=874005989 TSecr=561683483 8 09:10:48.272736 HAPROXY → BACKEND_B TCP 86 33532 → 9791 [ACK] Seq=3506 Ack=2 Win=26624 Len=0 TSval=561683484 TSecr=874005989 As you can see, haproxy instance made another try to establish connection and it did succeed but 50ms are over, and FIN was send right after GET-request. Conclusion: * Haproxy does not respond with 502 in case of timing out on ssl-connection establishing to backends * Seems strange to me that connection timer was not reset after the first unsuccessfull connection ("retries 1" was set) * SD-status of error is confusing :) -- Best regards, Maksim Kupriianov чт, 23 мая 2019 г. в 06:40, Willy Tarreau : > Hi Maksim, > > On Tue, May 21, 2019 at 01:47:30PM +0300, ?? ? wrote: > > Hi! > > > > I've run into some weird problem of many connections failed with SD > status > > in log. And I
Re: do we consider using patchwork ?
Hi Aleks, On Thu, May 23, 2019 at 08:05:18AM +0200, Aleksandar Lazic wrote: > From my point of view is the ci and issue tacker a good step forward but for > now we should try to focus on the list as it is still the main communication > channel. I mean, there are multiple valid communication channels, since there are multiple communications at the same time. For example, Lukas, is doing an awesome job at helping people on Discourse and only brings here qualified issues so that I almost never have to go there. Same with the github issues that we've wanted to have for quite some time without me being at the center of this, they work pretty well right now, and if something is ignored for too long, someone will ping here about the problem so it works remarkably well. > How about to add into the tools mailing forward and reply? Given that there are people who manage to sort this info first, I'd rather not for now. This is less stuff to concentrate on. For me it is very important to have a set of trusted people who I know do the right thing, because when an issue is escalated here or when I get a patch that was said to be validated regarding a GitHub issue, in general, I apply it without looking at it and it's a big relief not to have to review a patch. > We can then use the mail clients for communication and the tools will > receive the answers automatically. Someone would have to set this up, and possibly to develop a bot like Lukas did for the PRs. At the moment the stuff is more or less well balanced, it's just that we have added lots of useful tools in a short time and that these ones still need to be cared about because, well, it's the beginning. Also for me it's important that we don't forget the real goals : the goal is to improve haproxy, not to improve the tools. If improving the tools improves haproxy, fine. But the tools are not the goal but a way to reach the goal faster. For example, the CI is very useful since we now detect build breakage much earlier. However we need to keep in mind that it's an indication that we broke something, it must not be a goal to have green lights all the time. If something is broken on a platform (even an important one) because of an ongoing change and we consider it's more important to finish the changes than to fix this platform, I'm perfectly fine with this. It eases the developers' work by giving them the feedback they need without having to actively re-test their changes everywhere (and Travis is particularly good at this because it triggers a build almost instantly after a push so the loop feedback is very fast). For example, the problem that was reported by Cirrus on the FreeBSD build breakage by the recent watchdog changes annoys me, not because Cirrus is red, which I don't care about, but because it's still broken after the fix that I thought valid, and now I know that the supposedly valid POSIX defines I used to detect support are not portable, so I will need to be extra careful about this and to fix it while it's still fresh in my head. So let's just try to put a pause in the tooling improvements so that we still have a bit of time available for the code, and see what can be improved in 3-6 months once we feel that things are going well except a few that need to be addressed. It will save us from wasting time doing mistakes. Cheers, Willy
Re: do we consider using patchwork ?
Hi. Wed May 22 23:41:13 GMT+02:00 2019 Willy Tarreau : > Hi Ilya, > > On Thu, May 23, 2019 at 01:29:53AM +0500, ??? wrote: > > Hello, > > > > if we do not like using github PR and Willy receives 2k emails a day... > > do we consider using something like that > > https://patchwork.openvpn.net/project/openvpn2/list/ ? > > At least not now, please let's slow down on process changes, I cannot > catch up anymore. Really. I find myself spending 10 times more time in > a browser than what I used to do 6 months ago, for me it's becoming > very difficult. Between the issue tracker, the CI, github settings, the > links to dumps, confs or logs that are lazily copy-pasted instead of > sending the info itself etc... In the end I find myself working far > less efficiently for now, having to spend more time at work to produce > the same, however it helps others work more efficiently, which is nice. > But since I've always been a bottleneck, it remains important that we > don't forget to optimize my time, or everyone will spend their time > waiting for me, which I cannot accept. And the worst that can happen > is that I become a bottleneck due to processes because this would be > something I wouldn't be able to improve at all. Full ack. >From my point of view is the ci and issue tacker a good step forward but for >now we should try to focus on the list as it is still the main communication >channel. How about to add into the tools mailing forward and reply? We can then use the mail clients for communication and the tools will receive the answers automatically. Jm2c > I hope you can understand that no change comes with zero cost and that > for some people (like me) they come with a higher cost. Sometimes this > cost can be recovered over time, sometimes it's a pure loss. So let's > not engage too many changes at once and keep some time to observe the > outcome of everything we've done for now. > > Cheers, > Willy Regards Aleks