On Wed, Feb 21, 2018 at 11:07 AM, Lukas Tribus <lu...@ltri.eu> wrote:
> Hello Baptiste, > > > On 21 February 2018 at 08:45, Baptiste <bed...@gmail.com> wrote: > >> Is this downgrade at good thing in the first place? Doesn't it hide > >> configuration and network issues, make troubleshooting more complex > >> and the haproxy behavior less predictable? > > > > > > It is an rfc recommendation (rfc number is commented somewhere in the > source > > code, but I am on a mobile and can't access it). > > Its purpose is to hide networking issues when responses has to cross the > > internet and behavior is not predictable. > > And I can see how this would be useful in end-user situations, > browser, smartphone apps, etc. > > However in Haproxy the administrator *explicitly* configures a higher > payload size, because this higher payload size is probably actually > *required* to get all backend servers. Silently downgrading the > payload size is harmful in my opinion here. > > Maybe there is a misunderstood here. HAProxy already downgrade "silently" the accepted payload size. I mean, this "feature" already exists. This patch simply ensure the downgrade happens in timeout cases only. > >> When we see a response with the TC flag set, do we drop it or do we > >> still consider the DNS response? > > > > Haproxy ignores tc flag for now. > > Ok, but you can see how ignoring the TC flag, combined with automatic > payload size downgrade will make the backend servers number fluctuate > with a little bit of packet loss? So with 40 servers in DNS and the > loss of a single packet we will downgrade the entire backend to > whatever fitted in the 1280 byte downgraded response. > > I would call this behavior highly undesirable. > > You can play with "hold obsolete <timer>" to ensure that unseen records are kept for <timer> time. > > Note: networks/firewalls/router may more likely drop fragmented IP > packets than "normal" IP packets (as they may try to reassemble them > to actually apply layer 4+ ACLs, which requires buffers, which can > overflow) and this makes it even worse. > > > > > Later, it will have to trigger a failover to tcp. > > Failing over to TCP on TC responses is the proper solution to this > all, but in the meantime I would argue that we should make the > behavior as predictable and stable as we can. > Let me know what you think, > > I think that until we can do DNS over TCP, it is safer to downgrade the announced accepted payload size in case of a timeout and ensure we'll still receive answer than never ever downgrade and stop receiving answers. As I explained above, this "feature" (downgrade) already exists in HAProxy, but is applied to too many cases. This patch simply ensure that it is applied to timeouts only. So I don't see what's wrong with this patch. Now, I agree the "silent" application of the downgrade could be problematic and we could imagine some solutions to fix this: - emit a WARNING message / log when it happens - create a new command on the CLI to display current accepted payload - create a new command on the CLI to set the accepted payload to a value decided by the admin (so he can perform an upgrade in case a downgrade happened) Any other suggestion is welcome. Baptiste