Hi Conrad,
> first of all: great to see that this is making progress! I am very > excited > about everything related to SRV records and also server-templates. I > tested > a fresh master build with these patches applied, here are my > observations: Thanks a lot for taking time to test and report your findings! > > On 08/11/2017 11:10 AM, Baptiste Assmann wrote: > > > > Hi All > > > > So, I enabled latest (brilliant) contribution from Olivier into my > > Kubernetes cluster and I discovered it did not work as expected. > > After digging into the issues, I found 3 bugs directly related to > > the > > way SRV records must be read and processed by HAProxy. > > It was clearly hard to spot them outside a real orchestrator :) > > > > Please find in attachment 3 patches to fix them. > > > > Please note that I might have found an other bug, that I'll dig > > into > > later. > > When "scalling in" (reducing an app footprint in kubernetes), > > HAProxy > > considers some servers (pods in kubernetes) in error "no dns > > resolution". This is normal. What is not normal is that those > > servers > > never ever come back to live, even when I scale up again> > > Note that thanks to (Salut) Fred contribution about server- > > templates > > some time ago, we can do some very cool fancy configurations like > > the > > one below: (I have a headless service called 'red' in my > > kubernetes, it > > points to my 'red' application) > > > > backend red > > server-template red 20 > > _http._tcp.red.default.svc.cluster.local:8080 > > inter 1s resolvers kube check > > > > In one line, we can enable automatic "scalling follow-up" in > > HAProxy. > I tried a very similar setup, like this: > > > > > resolvers servicediscovery > > nameserver dns1 10.33.60.31:53 > > nameserver dns2 10.33.19.32:53 > > nameserver dns3 10.33.25.28:53 > > > > resolve_retries 3 > > timeout retry 1s > > hold valid 10s > > hold obsolete 5s > > > > backend testbackend > > server-template test 20 http.web.production.<internal-name>:80 > > check > This is the first time I am testing the server-template keyword at > all, but > I immediately noticed that I sometimes get a rather uneven > distribution of > pods, e.g. this (with the name resolving to 5 addresses): > > > > > $ echo "show servers state testbackend" | \ > > nc localhost 2305 | grep testbackend | \ > > awk '{print $5}' | sort | uniq -c > > 7 10.146.112.130 > > 6 10.146.148.92 > > 3 10.146.172.225 > > 4 10.146.89.208 > This uses only four of the five servers, with a quite uneven > distribution. > Other attempts do you use all five servers, but the distribution > still > seems pretty uneven most of the time. Is that intentional? Is the > list > populated randomnly? Nope, each IP read in the response should be affected to a single server. If this IP disapear, then the server will be considered as DOWN after some time. If new IPs arrive, then they'll be affected to available servers, or DOWN servers. > > Then, nothing changed when I scaled up or down (except the health > checks > taking some serves offline), but the addresses were never updated. Is > that > the bug you mentioned, or am I doing it wrong? Well, you're supposed to see some changes, but as I said in my previous mail, we seem to have a last bug to fix since some servers go DOWN during a scale in nevers go up again during the next scale out... > Also, as more of a side node, we do use SRV records, but not > underscores > int the names, which I realize is not very common, but also not > exactly > forbidden (as far as I understand the RFC it's more of a suggestion). I tend to disagree: https://www.ietf.org/rfc/rfc2782.txt =======8<====== The format of the SRV RR Here is the format of the SRV RR, whose DNS type code is 33: _Service._Proto.Name TTL Class SRV Priority Weight Port Target =======8<====== Kubernetes seems to be tolerant and my SRV query type for _http._tcp.red.default.svc.cluster.local returns the same result with red.default.svc.cluster.local. >From Kubernetes documentation, it seems that they did first implemented the version without the underscore first and kept it for compatibility purpose.... https://kubernetes.io/docs/concepts/services-networking/dns-pod-service / =====8<===== Backwards compatibility Previous versions of kube-dns made names of the form my-svc.my- namespace.cluster.local (the ‘svc’ level was added later). This is no longer supported =====8<88888 > Would > be great if this could be indicated in some way in the config maybe. Well, I don't agree, as explained above :) That said, technically, this may be doable by playing with the "resolve-prefer" parameter. For now it accepts only 'ipv4' and 'ipv6', but we could add 'srv'... I expect some feedback from the community on this particular point. > And lastly, I know this isn't going to be solved on a Friday > afternoon, but > I'll let you know that our infrastructure has reached a scale where > DNS > over UDP almost never cuts it anymore (due to the amount of records > returned), and I think many people who are turning to e.g. Kubernetes > do so > because they have to operate at such scale, so my guess is this might > be > one of the more frequently requested features at some point :) May I ask you how many records your can return at max? Well, Olivier implemented a "time to leave" for the records in the cache. I mean that your POD ip must not be seen for "hold obsolete" period of time before the server associated to it is considered as DOWN. So even if the whole set of servers can't stand in a single response, with some luck, we'll see it often enough to prevent disabling it... Well, that said, I do agree with you, we may need to implement DNS over TCP at some point. We just wait some more feedback about this point. Note that I may implement soon EDNS to announce HAProxy can support bigger DNS responses. This is not ideal, but may be used as a quick and dirty workaround until we have something more reliable. > These just as "quick" feedback, depending on the time I'll have I'll > try to > take a closer look at a few things and provide more details if > possible. > > Again, thanks a lot for working on this, let me know if you are > interested > in any specific details. You're welcome. I'm just interested by how many SRV records you could get at most in a response. This will be very helpfull. Baptiste