Observations about reloads and DNS SRV records
I've been testing DNS service discovery and the use of SRV records and have a few thoughts on a couple things that I noticed. 1. Reloading with SRV records ignores server-state-file - While this is not a huge deal, it does mean that the backend in question becomes unavailable when the proxy is reloaded until the SRV and subsequent A records are resolved - I understand that the point of these records is to dynamically build the backend, and populating servers from outdated and potentially invalid information is not ideal, but it might be nice to seed the backend before the first SRV query returns - Below is the related config since it is very likely that I have misconfigured something :) globals ... server-state-base /var/lib/haproxy server-state-file state defaults ... load-server-state-from-file global default-server init-addr last,libc,none ... resolvers sd nameserver sd 127.0.0.1: backend dynamic server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd resolve-prefer ipv4 check 2. Additional record responses from the nameserver are not parsed - This just means that any servers that are populated from the SRV records require a second round of querying for each of the hosts after the fqdn is stored. It might be more efficient if these records are also parsed but I can see that it might be pretty challenging in the current DNS resolver - Only reason I thought of this was to try and reduce up the time it takes to populate the backend servers with addresses in an effort to lessen the effects of #1 I'm happy with the workaround I'll be pursing for now where my SD service (that originally was going to be a resolver and populate via SRV records) is going to write all the backend definitions to disk so this is not a pressing issue, just thought I'd share the limitations I discovered. My knowledge of C (and the internal workings of HAproxy) is not great otherwise this would probably be a patch submission for #1 :) Tait
Re: Observations about reloads and DNS SRV records
Hi, Forget the backend id, it's the wrong answer to that problem. I was investigating an other potential issue, but this does not fix the original problem reported here. Here is the answer I delivered today on discourse, where other people have also reported the same issue: Just to let you know that I think I found the cause of the issue but I don’t have a fix yet. I’ll come back to you this week with more info and hopefully a fix. The issue seem to be in srv_init_addr(), because srv->hostname is not set (null). Baptiste
Re: Observations about reloads and DNS SRV records
Hi, Actually, the problem was deeper than my first thought. In its current state, statefile and SRV records are simply not compatible. I had to add a new field in the state file format to add support to this. Could you please confirm the patch attached fixes your issues? Baptiste On Mon, Jun 25, 2018 at 11:48 AM, Baptiste wrote: > Hi, > > Forget the backend id, it's the wrong answer to that problem. > I was investigating an other potential issue, but this does not fix the > original problem reported here. > > Here is the answer I delivered today on discourse, where other people have > also reported the same issue: > >Just to let you know that I think I found the cause of the issue but I > don’t have a fix yet. >I’ll come back to you this week with more info and hopefully a fix. >The issue seem to be in srv_init_addr(), because srv->hostname is not > set (null). > > Baptiste > > > From 6899b19b9686b6dadc65b89adfb32c8792004663 Mon Sep 17 00:00:00 2001 From: Baptiste Assmann Date: Mon, 2 Jul 2018 17:00:54 +0200 Subject: [PATCH 1/2] BUG/MEDIUM: dns: fix incomatibility between SRV resolution and server state file Server state file has no indication that a server is currently managed by a DNS SRV resolution. And thus, both feature (DNS SRV resolution and server state), when used together, does not provide the expected behavior: a smooth experience... This patch introduce the "SRV record name" in the server state file and loads and applies it if found and wherever required. --- include/types/server.h | 7 --- src/proxy.c| 10 -- src/server.c | 45 + 3 files changed, 57 insertions(+), 5 deletions(-) diff --git a/include/types/server.h b/include/types/server.h index 0cd20c0..29b88f8 100644 --- a/include/types/server.h +++ b/include/types/server.h @@ -126,10 +126,11 @@ enum srv_initaddr { "bk_f_forced_id " \ "srv_f_forced_id "\ "srv_fqdn " \ -"srv_port" +"srv_port"\ +"srvrecord" -#define SRV_STATE_FILE_MAX_FIELDS 19 -#define SRV_STATE_FILE_NB_FIELDS_VERSION_1 18 +#define SRV_STATE_FILE_MAX_FIELDS 20 +#define SRV_STATE_FILE_NB_FIELDS_VERSION_1 19 #define SRV_STATE_LINE_MAXLEN 512 /* server flags -- 32 bits */ diff --git a/src/proxy.c b/src/proxy.c index c262966..c1c41ba 100644 --- a/src/proxy.c +++ b/src/proxy.c @@ -1429,6 +1429,7 @@ static int dump_servers_state(struct stream_interface *si, struct chunk *buf) char srv_addr[INET6_ADDRSTRLEN + 1]; time_t srv_time_since_last_change; int bk_f_forced_id, srv_f_forced_id; + char *srvrecord; /* we don't want to report any state if the backend is not enabled on this process */ if (px->bind_proc && !(px->bind_proc & pid_bit)) @@ -1458,18 +1459,23 @@ static int dump_servers_state(struct stream_interface *si, struct chunk *buf) bk_f_forced_id = px->options & PR_O_FORCED_ID ? 1 : 0; srv_f_forced_id = srv->flags & SRV_F_FORCED_ID ? 1 : 0; + srvrecord = NULL; + if (srv->srvrq && srv->srvrq->name) + srvrecord = srv->srvrq->name; + chunk_appendf(buf, "%d %s " "%d %s %s " "%d %d %d %d %ld " "%d %d %d %d %d " -"%d %d %s %u" +"%d %d %s %u %s" "\n", px->uuid, px->id, srv->puid, srv->id, srv_addr, srv->cur_state, srv->cur_admin, srv->uweight, srv->iweight, (long int)srv_time_since_last_change, srv->check.status, srv->check.result, srv->check.health, srv->check.state, srv->agent.state, -bk_f_forced_id, srv_f_forced_id, srv->hostname ? srv->hostname : "-", srv->svc_port); +bk_f_forced_id, srv_f_forced_id, srv->hostname ? srv->hostname : "-", srv->svc_port, +srvrecord ? srvrecord : "-"); if (ci_putchk(si_ic(si), &trash) == -1) { si_applet_cant_put(si); return 0; diff --git a/src/server.c b/src/server.c index 277d140..cb13793 100644 --- a/src/server.c +++ b/src/server.c @@ -2678,6 +2678,7 @@ static void srv_update_state(struct server *srv, int version, char **params) const char *fqdn; const char *port_str; unsigned int port; + char *srvrecord; fqdn = NULL; port = 0; @@ -2701,6 +2702,7 @@ static void srv_update_state(struct server *srv, int version, char **params) * srv_f_forced_id: params[12] * srv_fqdn: params[13] * srv_port: params[14] + * srvrecord:params[15] */ /* validating srv_op_state */ @@ -2833,6 +2835,13 @@ static void srv_update_state(struct server *srv, int version, char **params) } } + /* SRV record + * NOTE: in HAProxy, SRV records must start with an underscore '_' + */ + srvrecord = params[15]; + if (srvrecord && *srvrecord != '_') +srvrecord = NULL; + /* don't apply anything if one error has been detected */ if (msg->len) goto out; @@ -2965,6 +2974,41 @@ static void srv_update_state(struct server *srv, int version, char **params) } } } + /*
Re: Observations about reloads and DNS SRV records
Hey Baptiste, I’ll try it out next week when I get back (currently on vacation) and let you know. Thanks! Tait On Tue., Jul. 3, 2018 at 06:24 Baptiste wrote: > Hi, > > Actually, the problem was deeper than my first thought. > In its current state, statefile and SRV records are simply not compatible. > I had to add a new field in the state file format to add support to this. > > Could you please confirm the patch attached fixes your issues? > > Baptiste > > > > On Mon, Jun 25, 2018 at 11:48 AM, Baptiste wrote: > >> Hi, >> >> Forget the backend id, it's the wrong answer to that problem. >> I was investigating an other potential issue, but this does not fix the >> original problem reported here. >> >> Here is the answer I delivered today on discourse, where other people >> have also reported the same issue: >> >>Just to let you know that I think I found the cause of the issue but I >> don’t have a fix yet. >>I’ll come back to you this week with more info and hopefully a fix. >>The issue seem to be in srv_init_addr(), because srv->hostname is not >> set (null). >> >> Baptiste >> >> >> >
Re: Observations about reloads and DNS SRV records
This is related to the discourse thread (And also discuss the issues in ML instead of discourse). https://discourse.haproxy.org/t/config-reload-with-dynamic-service-discovery-via-dns/2625/10 Here are the findings from my ends: 1 - State file / Port range invalid conversion In the haproxy state file the Port is written as an unsigned long value.. ie: Port 32777 is actually written 4294934537 112 defaultback_failsaife 1 varnish1 10.100.20.78 2 0 1 1 203910 15 3 4 6 0 0 0 ip-10-100-20-78.node.aws-us-east-1.consul 4294934537 _tcp_.varnish .service.consul To allow the seamless reload, I had to comment the following lines in the src/server.c if (port > USHRT_MAX) { chunk_appendf(msg, “, invalid srv_port value ‘%s’”, port_str); port_str = NULL; } 2 - Backend responds with 503 only after reload In order to make it functional, in the patch i had to comment the following lines : /* // prepare DNS resolution for this server (but aint this has already been done by the server-template function?) res = srv_prepare_for_resolution(srv, fqdn); if (res == -1) { ha_alert(“could not allocate memory for DNS REsolution for server … ‘%s’\n”, srv->id); chunk_appendf(msg, “, can’t allocate memory for DNS resolution for server ‘%s’”, srv->id); HA_SPIN_UNLOCK(SERVER_LOCK, &srv->lock); goto out; } */
Re: Observations about reloads and DNS SRV records
Hi Tait. On 06/06/2018 11:16, Tait Clarridge wrote: I've been testing DNS service discovery and the use of SRV records and have a few thoughts on a couple things that I noticed. In this area was a lot of changes in the last version of haproxy, do you have tested the setup with the latest haproxy 1.8 release? Please can you be so kind and send us the output of haproxy -vv 1. Reloading with SRV records ignores server-state-file - While this is not a huge deal, it does mean that the backend in question becomes unavailable when the proxy is reloaded until the SRV and subsequent A records are resolved - I understand that the point of these records is to dynamically build the backend, and populating servers from outdated and potentially invalid information is not ideal, but it might be nice to seed the backend before the first SRV query returns - Below is the related config since it is very likely that I have misconfigured something :) globals ... server-state-base /var/lib/haproxy server-state-file state defaults ... load-server-state-from-file global default-server init-addr last,libc,none ... resolvers sd nameserver sd 127.0.0.1: backend dynamic server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd resolve-prefer ipv4 check 2. Additional record responses from the nameserver are not parsed - This just means that any servers that are populated from the SRV records require a second round of querying for each of the hosts after the fqdn is stored. It might be more efficient if these records are also parsed but I can see that it might be pretty challenging in the current DNS resolver - Only reason I thought of this was to try and reduce up the time it takes to populate the backend servers with addresses in an effort to lessen the effects of #1 I'm happy with the workaround I'll be pursing for now where my SD service (that originally was going to be a resolver and populate via SRV records) is going to write all the backend definitions to disk so this is not a pressing issue, just thought I'd share the limitations I discovered. My knowledge of C (and the internal workings of HAproxy) is not great otherwise this would probably be a patch submission for #1 :) Tait
Re: Observations about reloads and DNS SRV records
On Wed, Jun 6, 2018 at 11:35 AM Aleksandar Lazic wrote: > Hi Tait. > > On 06/06/2018 11:16, Tait Clarridge wrote: > >I've been testing DNS service discovery and the use of SRV records and > have > >a few thoughts on a couple things that I noticed. > > In this area was a lot of changes in the last version of haproxy, do you > have > tested the setup with the latest haproxy 1.8 release? > > Please can you be so kind and send us the output of > > haproxy -vv > > Hey Aleksandar, I'm on 1.8.8 so I will try to give it a shot with 1.8.9 later this afternoon. I've added the output below. HA-Proxy version 1.8.8 2018/04/19 Copyright 2000-2018 Willy Tarreau Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -fno-strict-overflow -Wno-format-truncation -Wno-null-dereference -Wno-unused-label OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with OpenSSL version : OpenSSL 1.1.0h-fips 27 Mar 2018 Running on OpenSSL version : OpenSSL 1.1.0g-fips 2 Nov 2017 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 Built with Lua version : Lua 5.3.4 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Encrypted password support via crypt(3): yes Built with multi-threading support. Built with PCRE version : 8.42 2018-03-20 Running on PCRE version : 8.41 2017-07-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with zlib version : 1.2.11 Running on zlib version : 1.2.11 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with network namespace support. Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available filters : [SPOE] spoe [COMP] compression [TRACE] trace Thanks, Tait
Re: Observations about reloads and DNS SRV records
Hi Tait A few comments inline: 1. Reloading with SRV records ignores server-state-file > - While this is not a huge deal, it does mean that the backend in > question becomes unavailable when the proxy is reloaded until the SRV and > subsequent A records are resolved > - I understand that the point of these records is to dynamically build > the backend, and populating servers from outdated and potentially invalid > information is not ideal, but it might be nice to seed the backend before > the first SRV query returns > This should not happen and it's a known issue that we're working on. > - Below is the related config since it is very likely that I have > misconfigured something :) > > globals > ... > server-state-base /var/lib/haproxy > server-state-file state > > defaults > ... > load-server-state-from-file global > default-server init-addr last,libc,none > > ... > > resolvers sd > nameserver sd 127.0.0.1: > > backend dynamic > server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd > resolve-prefer ipv4 check > > 2. Additional record responses from the nameserver are not parsed > This is true, by design. > - This just means that any servers that are populated from the SRV > records require a second round of querying for each of the hosts after the > fqdn is stored. It might be more efficient if these records are also parsed > but I can see that it might be pretty challenging in the current DNS > resolver > - Only reason I thought of this was to try and reduce up the time it > takes to populate the backend servers with addresses in an effort to lessen > the effects of #1 > Actually, I tested many DNS server and some of them simply did not send the additional records when they could not fit in the response (too small payload for the number of SRV records). Technically, we could try to use additional records if available and then failover to current way of working if none found. > I'm happy with the workaround I'll be pursing for now where my SD service > (that originally was going to be a resolver and populate via SRV records) > is going to write all the backend definitions to disk so this is not a > pressing issue, just thought I'd share the limitations I discovered. My > knowledge of C (and the internal workings of HAproxy) is not great > otherwise this would probably be a patch submission for #1 :) > > Tait > > I'll check that for you. (In the mean time, please keep on answering to Aleksandar emails, the more info I'll have, the best). Baptiste
Re: Observations about reloads and DNS SRV records
Hi Baptiste, thanks for the response. On Wed, Jun 6, 2018 at 6:32 PM Baptiste wrote: > > This should not happen and it's a known issue that we're working on. > > Excellent, figured you guys were probably already aware of it. Let me know if I can assist in testing. > > Actually, I tested many DNS server and some of them simply did not send > the additional records when they could not fit in the response (too small > payload for the number of SRV records). > Technically, we could try to use additional records if available and then > failover to current way of working if none found. > > True, not a lot do and I don't have a lot of hosts. I assumed it already parsed them based on reading https://www.haproxy.com/documentation/aloha/9-5/traffic-management/lb-layer7/dns-srv-records/ and took that into account when writing my DNS service. I'm a little swamped with other work at the moment, but when I get a chance I would be able to provide a DNS server (written in Go) that returns additional records to test with if that helps. > > >> I'm happy with the workaround I'll be pursing for now where my SD service >> (that originally was going to be a resolver and populate via SRV records) >> is going to write all the backend definitions to disk so this is not a >> pressing issue, just thought I'd share the limitations I discovered. My >> knowledge of C (and the internal workings of HAproxy) is not great >> otherwise this would probably be a patch submission for #1 :) >> >> Tait >> >> > I'll check that for you. (In the mean time, please keep on answering to > Aleksandar emails, the more info I'll have, the best). > > Baptiste > Thanks again, I don't have a lot of time to do any testing right now but hope to soon.
Re: Observations about reloads and DNS SRV records
Hi Tait, 1. Reloading with SRV records ignores server-state-file > Can you tell me more about this one. How do you see that? I mean, I have a conf similar to yours and I can see HAProxy parsing the server state file (and returning a bunch of warning I'm working on about backend name and ID mismatch). ==> while writing this mail, I am able to reproduce the issue I think: - start HAProxy with SRV records - dump servers state - update haproxy conf to prevent dns resolution at runtime - reload haproxy ==> my servers are now "unconfigured"... (no IP, no FQDN, nothing from the servers state file) - While this is not a huge deal, it does mean that the backend in > question becomes unavailable when the proxy is reloaded until the SRV and > subsequent A records are resolved > Well, this is the role of the server state file: populating server info at configuration parsing time so you don't have to wait until first DNS resolution to get servers' information. ==> this might be caused by the bug found above. 2. Additional record responses from the nameserver are not parsed > - This just means that any servers that are populated from the SRV > records require a second round of querying for each of the hosts after the > fqdn is stored. It might be more efficient if these records are also parsed > but I can see that it might be pretty challenging in the current DNS > resolver > - Only reason I thought of this was to try and reduce up the time it > takes to populate the backend servers with addresses in an effort to lessen > the effects of #1 > > I'll work on this one as soon as I fixed the bug above/ Baptiste
Re: Observations about reloads and DNS SRV records
> > I'm a little swamped with other work at the moment, but when I get a > chance I would be able to provide a DNS server (written in Go) that returns > additional records to test with if that helps. > Thanks for giving me this idea! I wrote a quick and inflexible one here: https://github.com/bedis/dnsserver So feel free to contribute to it or write your own :) I'm going to use it to troubleshoot the issue you reported. That said, nothing is better than other real DNS servers (bind / unbound / powerdns and others) for real production. Baptiste
Re: Observations about reloads and DNS SRV records
Hey Baptiste, On Sun, Jun 10, 2018 at 3:19 AM Baptiste wrote: > > ==> while writing this mail, I am able to reproduce the issue I think: > - start HAProxy with SRV records > - dump servers state > - update haproxy conf to prevent dns resolution at runtime > - reload haproxy > ==> my servers are now "unconfigured"... (no IP, no FQDN, nothing from the > servers state file) > > That is exactly the issue I was seeing, my apologies for not describing the actual issue I was seeing when I opened this thread. >> > 2. Additional record responses from the nameserver are not parsed >> - This just means that any servers that are populated from the SRV >> records require a second round of querying for each of the hosts after the >> fqdn is stored. It might be more efficient if these records are also parsed >> but I can see that it might be pretty challenging in the current DNS >> resolver >> - Only reason I thought of this was to try and reduce up the time it >> takes to populate the backend servers with addresses in an effort to lessen >> the effects of #1 >> >> > I'll work on this one as soon as I fixed the bug above/ > Great! The first one will go a long way in making reloads pretty seamless for DNS, at least for my use case :). Thanks again, Tait
Re: Observations about reloads and DNS SRV records
On Sun, Jun 10, 2018 at 3:22 AM Baptiste wrote: > > Thanks for giving me this idea! > I wrote a quick and inflexible one here: > https://github.com/bedis/dnsserver > So feel free to contribute to it or write your own :) > > I'm going to use it to troubleshoot the issue you reported. That said, > nothing is better than other real DNS servers (bind / unbound / powerdns > and others) for real production. > Looks good! My application is using the same DNS library (miekg/dns) which seems to be the most well-rounded and mature for go. If I think of anything to add I'll open a PR but it is currently exactly how I am returning my results. > > Baptiste >
Re: Observations about reloads and DNS SRV records
Hi Tait, There is a first thing you can do to prevent this from happening, but this might be complicated to implement. If you force a backend id in the configuration file, the problem won't exist anymore... backend be server-template srv 5 _http._tcp.srv.tld resolvers dnsserver resolve-prefer ipv4 check id 5 backend be2 server-template srv 5 _http._tcp.srv.tld resolvers dnsserver resolve-prefer ipv4 check id 4 When I change the order of those 2 backends, my config reloads properly. Note that I'm currently investigating a way to make this happen without enforcing the backend id. Baptiste