Hi, first of all: great to see that this is making progress! I am very excited about everything related to SRV records and also server-templates. I tested a fresh master build with these patches applied, here are my observations:
On 08/11/2017 11:10 AM, Baptiste Assmann wrote: > Hi All > > So, I enabled latest (brilliant) contribution from Olivier into my > Kubernetes cluster and I discovered it did not work as expected. > After digging into the issues, I found 3 bugs directly related to the > way SRV records must be read and processed by HAProxy. > It was clearly hard to spot them outside a real orchestrator :) > > Please find in attachment 3 patches to fix them. > > Please note that I might have found an other bug, that I'll dig into > later. > When "scalling in" (reducing an app footprint in kubernetes), HAProxy > considers some servers (pods in kubernetes) in error "no dns > resolution". This is normal. What is not normal is that those servers > never ever come back to live, even when I scale up again> > Note that thanks to (Salut) Fred contribution about server-templates > some time ago, we can do some very cool fancy configurations like the > one below: (I have a headless service called 'red' in my kubernetes, it > points to my 'red' application) > > backend red > server-template red 20 _http._tcp.red.default.svc.cluster.local:8080 > inter 1s resolvers kube check > > In one line, we can enable automatic "scalling follow-up" in HAProxy. I tried a very similar setup, like this: > resolvers servicediscovery > nameserver dns1 10.33.60.31:53 > nameserver dns2 10.33.19.32:53 > nameserver dns3 10.33.25.28:53 > > resolve_retries 3 > timeout retry 1s > hold valid 10s > hold obsolete 5s > > backend testbackend > server-template test 20 http.web.production.<internal-name>:80 check This is the first time I am testing the server-template keyword at all, but I immediately noticed that I sometimes get a rather uneven distribution of pods, e.g. this (with the name resolving to 5 addresses): > $ echo "show servers state testbackend" | \ > nc localhost 2305 | grep testbackend | \ > awk '{print $5}' | sort | uniq -c > 7 10.146.112.130 > 6 10.146.148.92 > 3 10.146.172.225 > 4 10.146.89.208 This uses only four of the five servers, with a quite uneven distribution. Other attempts do you use all five servers, but the distribution still seems pretty uneven most of the time. Is that intentional? Is the list populated randomnly? Then, nothing changed when I scaled up or down (except the health checks taking some serves offline), but the addresses were never updated. Is that the bug you mentioned, or am I doing it wrong? Also, as more of a side node, we do use SRV records, but not underscores int the names, which I realize is not very common, but also not exactly forbidden (as far as I understand the RFC it's more of a suggestion). Would be great if this could be indicated in some way in the config maybe. And lastly, I know this isn't going to be solved on a Friday afternoon, but I'll let you know that our infrastructure has reached a scale where DNS over UDP almost never cuts it anymore (due to the amount of records returned), and I think many people who are turning to e.g. Kubernetes do so because they have to operate at such scale, so my guess is this might be one of the more frequently requested features at some point :) These just as "quick" feedback, depending on the time I'll have I'll try to take a closer look at a few things and provide more details if possible. Again, thanks a lot for working on this, let me know if you are interested in any specific details. Thanks a lot, Conrad -- Conrad Hoffmann Traffic Engineer SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B