Hm, I'm still struggling to understand why this is a problem as an option for operators, sorry if I'm being dense!
> The server state is a perfect example of huge breakage that would be > caused by such a change. In short, you start with the names from the > configuration file, which are looked up in the state file, then the > names are changed, and once dumped, they are different. So upon next > reload, nothing matches at all. As long as the configuration file is updated before updating the name this should be fine, shouldn't it? I was under the impression that best practice for using dynamic HAProxy server features was always to write out the modified configuration file when sending socket commands. That way reloads are consistent, and if someone really needs to know what exactly is happening right now in the running process that's what show stat or show servers state is for? I'm also struggling to see why the state file is an issue, my understanding is that you save the server state file before a reload, so why would you send a name update after you save the server sate? Fwiw I've tested the patch and "show server state" correctly displays the updated name [1], but I don't think that's what you are concerned about... > For me it's a more general problem of inconsistency between the configuration > and what is running. For example if you have this :\ I very much agree, I just thought that we were already going down this path with dynamically updating addrs and using server templates (which introduce config-runtime drift as well). As long as whatever is doing the socket manipulation (in our case synapse), does file configuration changes first shouldn't this be reasonably ok? > > backend foo > use-server srv1 if rule1 > use-server srv2 if rule2 > server srv1 1.1.1.1:80 > server srv2 2.2.2.2:80 > > Do you know what happens once you'll switch the names over the CLI ? (hint: > I know). My understanding is that we'd continue sending to the original servers. I do see this would be bad, but isn't the requirement on the user to not do this, specifically if you have use-server rules then you have to reload the config to pick up changes in backends? Furthermore, rule1 or rule2 could refer to hard coded IP addresses which would also be wrong if we updated the addr of the server with "set server addr" as it is. Also thinking about this more, if use-server rules dynamically searched for the server instead of storing that might be a nifty way to do like primary replica failover, something like: backend foo use-server leader if write_operation server leader 1.1.1.1:1234 server follower1 2.2.2.2:1234 server follower2 3.3.3.3:1234 Then to do a leader failover you would just set the leader to a follower (during which time you 503 write requests), and promote a follower to a leader. I think named backends with a use_backend ACL is probably the _better_ way to do this, but it's an interesting thought. > And it will even be inconsistent with what use_backend does because > use_backend *can* resolve at runtime instead of config time. The logs will > return the current name depending on the last change, possibly conflicting > with what is in the configuration. Another example : > > backend foo1 > server srv1 1.1.1.1:80 track foo2/srv1 > server srv2 2.2.2.2:80 track foo2/srv2 > > backend foo2 > server srv1 1.1.1.1:80 check > server srv2 2.2.2.2:80 check > > Here after a renaming of the tracked servers, it becomes a mess, you'll > constantly have to know what's taken at startup time and what's taken at > runtime. I agree, but if an administrator is using features like track or use-server I think they should pick up a server change with a reload rather than with the dynamic socket. Similar to if they are using ip or port based ACLs they should probably reload instead of setting the address dynamically? > On the opposite, there's zero backwards compatibility concerns to have > with adding a description field because we don't have it for servers for > now, only for backends and frontends. I see, ok I will see if this can work. >> and I'm still concerned about how does >> automation go from address to name going forward. > > Just a question, wouldn't your requirement in fact be to lookup a server > by its address ? I mean, wouldn't it be easier for your use case ? If so > I'm pretty sure that's something we could easily apply. Either we apply > this to the first server that matches the address, or we apply it to all > those which match, depending on the difficulty. > > For example wouldn't it be easier for you to be able to do : > > set server addr bk/@1.1.1.1:80 2.2.2.2:80 > > or : > > disable server bk/@1.1.1.1:80 > > ? Interesting syntax, I think if we added that to all the set server commands it could work (we'd have to port from using enable/disable server to set server state, but that could work) for all but the strangest edge case in synapse. For context Synapse generates HAProxy server names from host:port + a user provided "name" (which is typically the fqdn, but some users use it to double register the same physical server, which this syntax would no longer allow). It does seem to me though like we're doing a lot more work then we need (but that's probably because I don't understand why updating the name is bad yet heh). > I'm pretty sure you have a very valid use case that we probably need to > discuss more to find the largest intersection with what can reasonably be > done here without breaking existing setups. We know that the management > socket needs to be significantly improved for some maintenance operations, > so it's even more important to know what your ideal usage scenario would > be to see how it could be mapped here. The usage scenario is microservices, namely we have a bunch of service instances (containers) which are constantly churning around physical machines as applications autoscale, deploy, fail, get preempted, get bin-packed onto other machines etc ... This is pretty common in any Mesos or Kubernetes like service deployment. Right now we manage the churn with synapse, which gets dynamic push updates from a service registry (zookeeper, dns, etc), and automatically reconfigures (within seconds) and reloads (within minutes) HAProxy across a fleet of a few thousand machines. To give you an idea of the kind of churn, synapse handles about 10-20 thousand socket updates (mostly servers going in and out of maintenance) and about 200 reloads per day per machine. We mostly reload to pick up new servers, and although synapse tries to group server changes into 1 minute groups (so we aggregate all the changes within 1 minute) we still get about 200 restarts per day and for something simple like scaling up a backend by one server we have to wait 1 minute for HAProxies everywhere to pick that up. Ideally we can add servers just as fast as we can down/up them, which I thought might be possible with the new dynamic "set server addr" directive, but if we can't change the name then synapse can't map it's notion of a server to HAProxy's notion of a server and I don't know how to make it work. Synapse doesn't support fancy per server features like set-server and server tracking since they're really not needed in the microservice world. You just need a backend pool that dynamically updates as fast as possible as containers migrate around machines. > > It's unrelated, I meant the "char *old_name" in the middle of the code. Ah, right c. I've attached a modified patch which is a little safer per your suggestions, but it sounds like you're against updating the name so will probably throw that out. Also you can find an example of what show servers state looks like before and after dynamically updating here [1]. I'm still not quite understanding why allowing changing the name is going to cause breakage for users that are choosing to change the name dynamically, but I do really appreciate all the context you're giving me :-) -Joey [1] https://gist.github.com/jolynch/43fea470b9b5ab86bfac4969ee035fa6
0001-MINOR-cli-ability-to-change-a-server-s-name.patch
Description: Binary data