Hi,

first of all: great to see that this is making progress! I am very excited
about everything related to SRV records and also server-templates. I tested
a fresh master build with these patches applied, here are my observations:

On 08/11/2017 11:10 AM, Baptiste Assmann wrote:
> Hi All
> 
> So, I enabled latest (brilliant) contribution from Olivier into my
> Kubernetes cluster and I discovered it did not work as expected.
> After digging into the issues, I found 3 bugs directly related to the
> way SRV records must be read and processed by HAProxy.
> It was clearly hard to spot them outside a real orchestrator :)
> 
> Please find in attachment 3 patches to fix them.
> 
> Please note that I might have found an other bug, that I'll dig into
> later.
> When "scalling in" (reducing an app footprint in kubernetes), HAProxy
> considers some servers (pods in kubernetes) in error "no dns
> resolution". This is normal. What is not normal is that those servers
> never ever come back to live, even when I scale up again>
> Note that thanks to (Salut) Fred contribution about server-templates
> some time ago, we can do some very cool fancy configurations like the
> one below: (I have a headless service called 'red' in my kubernetes, it
> points to my 'red' application)
> 
> backend red
>   server-template red 20 _http._tcp.red.default.svc.cluster.local:8080
> inter 1s resolvers kube check
> 
> In one line, we can enable automatic "scalling follow-up" in HAProxy.

I tried a very similar setup, like this:

>  resolvers servicediscovery
>    nameserver dns1 10.33.60.31:53
>    nameserver dns2 10.33.19.32:53
>    nameserver dns3 10.33.25.28:53
>
>    resolve_retries       3
>    timeout retry         1s
>    hold valid           10s
>    hold obsolete         5s
>
>  backend testbackend
>    server-template test 20 http.web.production.<internal-name>:80 check

This is the first time I am testing the server-template keyword at all, but
I immediately noticed that I sometimes get a rather uneven distribution of
pods, e.g. this (with the name resolving to 5 addresses):

> $ echo "show servers state testbackend" | \
>    nc localhost 2305 | grep testbackend | \
>    awk '{print $5}' | sort | uniq -c
>      7 10.146.112.130
>      6 10.146.148.92
>      3 10.146.172.225
>      4 10.146.89.208

This uses only four of the five servers, with a quite uneven distribution.
Other attempts do you use all five servers, but the distribution still
seems pretty uneven most of the time. Is that intentional? Is the list
populated randomnly?

Then, nothing changed when I scaled up or down (except the health checks
taking some serves offline), but the addresses were never updated. Is that
the bug you mentioned, or am I doing it wrong?

Also, as more of a side node, we do use SRV records, but not underscores
int the names, which I realize is not very common, but also not exactly
forbidden (as far as I understand the RFC it's more of a suggestion). Would
be great if this could be indicated in some way in the config maybe.

And lastly, I know this isn't going to be solved on a Friday afternoon, but
I'll let you know that our infrastructure has reached a scale where DNS
over UDP almost never cuts it anymore (due to the amount of records
returned), and I think many people who are turning to e.g. Kubernetes do so
because they have to operate at such scale, so my guess is this might be
one of the more frequently requested features at some point :)

These just as "quick" feedback, depending on the time I'll have I'll try to
take a closer look at a few things and provide more details if possible.

Again, thanks a lot for working on this, let me know if you are interested
in any specific details.

Thanks a lot,
Conrad
-- 
Conrad Hoffmann
Traffic Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Reply via email to