Ah yes, I also added the following "init-addr none" statement on the server-template line. This prevents HAProxy from using libc resolvers, which might end up in unpredictible behavior in that enviroment....
Baptiste On Tue, Jul 3, 2018 at 3:18 PM, Baptiste <[email protected]> wrote: > Well, I can partially reproduce the issue you're facing and I can see some > weird behavior of AWS's DNS servers. > > First, by default, HAProxy only support DNS over UDP and can accept up to > 512 bytes of payload in the DNS response. > DNS over TCP is not yet available and accepted payload size can be > increased using EDNS0 extension. > > There is a "magic" number of SRV records with AWS and default HAProxy > accepted payload size, at around 4 SRV records, the response payload may be > bigger than 512 bytes. > And so, AWS DNS server does not return any data, simply returns an empty > response, with the TRUNCATED flag. > In such case, a client is supposed to replay the request over TCP... > > An other magic value with AWS DNS servers is that it won't return more > than 8 SRV records, even if you have 10 servers in your service. (even in > TCP) > AWS DNS servers will simply return a round robin list of the records, some > will disappear, some will reappear at some point in time. > > > Conclusion, to make HAProxy work in such environment, you want to > configure it that way: > resolvers awsdns > nameserver dns0 NAMESERVER:53 # <=== please remove the doule quotes > accepted_payload_size 8192 # <=== workaround for too > short accepted payload > hold obsolete 30s # <=== workaround > for limited number of records returned by AWS > > You may want to read the documentation of HAProxy's resolver. There are a > few other timeout / hold period you could tune. > > With the configuration above, I could easily scale from 2 to 10, back to > 2, passing through 4, 8, etc... successfully and without any server > flapping. > I did not try to go higher than 10. Bear in mind the "hold obsolete" > period is the period during which HAProxy considers a server as available > even if the DNS server did not return it in the SRV record list. > > Baptiste > > > > > > > > On Tue, Jul 3, 2018 at 1:26 PM, Baptiste <[email protected]> wrote: > >> Answering myself... I found my way in the menu to be able to allow port >> 9000 to read the stats page and to find the public IP associated to my >> "app". >> That said, I still can't get a shell on the running container, but I >> think I found an AWS documentation page for this purpose. >> >> I keep you updated. >> >> On Tue, Jul 3, 2018 at 1:06 PM, Baptiste <[email protected]> wrote: >> >>> Hi Jim, >>> >>> I think I have something running... >>> At least, terraform did not complain and I can see "stuff" in my AWS >>> dashoard. >>> Now, I have no idea how I can get connected to my running HAProxy >>> container, neither how I can troubleshoot what's happening :) >>> >>> Any help would be (again) appreciated. >>> >>> Baptiste >>> >>> >>> >>> On Tue, Jul 3, 2018 at 11:39 AM, Baptiste <[email protected]> wrote: >>> >>>> Hi Jim, >>>> >>>> Sorry for the long pause :) >>>> I was dealing with some travel, conferences and catching up on my >>>> backlog. >>>> So, the good news, is that this issue is now my priority :) >>>> >>>> I'll try to first reproduce it and come back to you if I have any issue >>>> during that step. >>>> (by the way, thanks for the github repo to help me speed up in that >>>> step). >>>> >>>> Baptiste >>>> >>>> >>>> >>>> >>>> On Mon, Jun 25, 2018 at 10:54 PM, Jim Deville < >>>> [email protected]> wrote: >>>> >>>>> Hi Bapiste, >>>>> >>>>> >>>>> I just wanted to follow up to see if you were able to repro and >>>>> perhaps had a patch we could try? >>>>> >>>>> >>>>> Jim >>>>> ------------------------------ >>>>> *From:* Jim Deville >>>>> *Sent:* Thursday, June 21, 2018 1:05:49 PM >>>>> *To:* Baptiste >>>>> *Cc:* [email protected]; Jonathan Works >>>>> *Subject:* Re: Issue with parsing DNS from AWS >>>>> >>>>> >>>>> Thanks for the reply, we were able to extract a minimal repro to >>>>> demonstrate the problem: https://github.com/jg >>>>> works/haproxy-servicediscovery >>>>> >>>>> >>>>> The docker folder contains a version of the config we're using and a >>>>> startup script to determine the local private DNS zone (AWS puts it at the >>>>> subnet's +2). >>>>> >>>>> >>>>> Jim >>>>> ------------------------------ >>>>> *From:* Baptiste <[email protected]> >>>>> *Sent:* Thursday, June 21, 2018 11:02:26 AM >>>>> *To:* Jim Deville >>>>> *Cc:* [email protected]; Jonathan Works >>>>> *Subject:* Re: Issue with parsing DNS from AWS >>>>> >>>>> and by the way, I had a quick look at the pcap file and could not find >>>>> anything weird. >>>>> The function you're pointing seem to say there is not enough space to >>>>> store a server's dns name, but the allocated space is larger that your >>>>> current records. >>>>> >>>>> Baptiste >>>>> >>>> >>>> >>> >> >

