Hello Baptiste,
I'm sorry if my comments are blunt, but I think this discussion is important and I do not want my messages to be ambiguous. I do appreciate all the work you are doing in the DNS subsystem. On 21 February 2018 at 18:05, Baptiste <bed...@gmail.com> wrote: >> However in Haproxy the administrator *explicitly* configures a higher >> payload size, because this higher payload size is probably actually >> *required* to get all backend servers. Silently downgrading the >> payload size is harmful in my opinion here. >> > > Maybe there is a misunderstood here. > HAProxy already downgrade "silently" the accepted payload size. I mean, this > "feature" already exists. > This patch simply ensure the downgrade happens in timeout cases only. I am aware that this patch is merely addressing a corner case of this already existing feature, I'm not saying the patch doesn't fix that corner case. I am saying that I have a strong opinion about this feature in the first place and I responded to this thread merely because I was not aware of this feature previously. >> Ok, but you can see how ignoring the TC flag, combined with automatic >> payload size downgrade will make the backend servers number fluctuate >> with a little bit of packet loss? So with 40 servers in DNS and the >> loss of a single packet we will downgrade the entire backend to >> whatever fitted in the 1280 byte downgraded response. >> >> I would call this behavior highly undesirable. >> > > You can play with "hold obsolete <timer>" to ensure that unseen records are > kept for <timer> time. I don't see how this is supposed to address the problem. Payload size is downgraded permanently (as I've found out below), not only per retry request, so we will forever use 1280, which will not contain the complete server set (after all, that's why the admin raised the payload size in the first place), so we will be missing - possibly a lot of - backend servers until we reload haproxy (which will work until the first DNS packet is lost), than haproxy will degrade again when the first DNS packet lost. >> Failing over to TCP on TC responses is the proper solution to this >> all, but in the meantime I would argue that we should make the >> behavior as predictable and stable as we can. >> >> >> Let me know what you think, >> > > I think that until we can do DNS over TCP, it is safer to downgrade the > announced accepted payload size in case of a timeout and ensure we'll still > receive answer than never ever downgrade and stop receiving answers. I disagree (strongly). We can't do anything about TC and DNS over TCP in haproxy 1.8. But it is my opinion that auto downgrading accepted payload size, while ignoring TC, is problematic in *a lot* of situations, with a very questionable benefit. When is the auto downgrade supposed to help? When we have fluctuating PathMTU issues in a datacenter I think we should fail hard and fast, not hide the problem. Other than that (hiding PathMTU problems) it will kneecap the loadbalancer when a single IP fragment of a DNS response is lost, by reducing the amount of available servers. Sure, if we would write lookup code for a recursive DNS server we should implement this fallback in every case. But we would also have proper TC handling in that case, so we would not work with truncated responses and we would certainly not permanently downgrade our ability to get large responses. > As I explained above, this "feature" (downgrade) already exists in HAProxy, > but is applied to too many cases. > This patch simply ensure that it is applied to timeouts only. > So I don't see what's wrong with this patch. I don't disagree with the patch, I disagree with the feature; I am fully aware that it already exist, I was not aware of the feature before you send the patch. You mentioned an RFC suggestion, and I believe RFC6891 #6.2.5 may be what you are talking about: https://tools.ietf.org/html/rfc6891#section-6.2.5 And it indeed suggests to fall back to a lower payload size, however: - it is a MAY: "A requestor MAY choose to implement a fallback" - it is kind-of implied (when reading the entire paragraph) that this is only relevant when the *default* payload size is above 1280 (so not relevant to haproxy, we don't default above 1280, we don't even enable edns0 by default), it's imo irrelevant when the admin is actually configuring the payload size - its obvious the TCP fallback on TC responses has to work - it imo also implies a fallback for the next retry-request, not a fallback for the entire runtime of the application > - create a new command on the CLI to set the accepted payload to a value > decided by the admin (so he can perform an upgrade in case a downgrade > happened) I don't understand; are you saying the feature currently downgrades the payload size for the resolver *permanently*, not just for the next retry? Therefor, when haproxy downgrades *currently*, we have to actually reload haproxy to restore the accept payload size? Well, that at least fully explains what the user reported here: https://discourse.haproxy.org/t/haproxy-1-8-2-1-8-3-dns-auto-discover-stop-working/2014/ > Now, I agree the "silent" application of the downgrade could be problematic > and we could imagine some solutions to fix this: Number 1 on this list should be to document this behavior in the doc for the accepted_payload_size keyword (and while there we should also mention that we do not fallback to TCP on TC responses). Logging is fine, completely disagree on the CLI keywords: rather than making the admin jump through hoops by making him workarounds the harm this feature causes externally (via scripting on the admin socket), we should make the actual feature configurable. I still think we should remove it entirely though. Baptiste, I don't think you'd find the symptoms I have in mind acceptable on a load-balancer, so there has to be a misunderstanding here. I would like to do some tests, maybe I can come up with a simple testcase that shows the behavior and then we can review the situation based on that testcase; I will probably need a few days for this though. Thanks and again, I really appreciate all the work on the DNS resolver you are doing. lukas